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ABSTRACT 

The  purpose  of  this  study  was  to  ascertain  current,  and 
anticipated  future,  military  requirements  for  integrated  circuit 
functions  which  necessitate  higher  speeds  or  circuit  complexity-' 
than  are  now  available.  An  attempt  was  to  be  made  to  define  a 
limited  class  of  large-scale  integrated  circuits  which  could 
satisfy  a  broad  range  of  DoD  military  system  needs  and  might 
therefore  be  developed  and  produced  as  standard  devices  in  large 
volume. 

Currently  operational  systems,  such  as  airborne  early 
warning  aircraft,  strategic  and  attack  submarines,  advanced 
fighter  aircraft,  air-to-air  missiles,  aircraft  and  satellite 
EM  surveillance  systems,  etc.,  contain  computationally  intensive 
subsystems.  In  these  systems,  the  physical  characteristics, 
performance,  or  total  contribution  to  life-cycle  costs  of  the 
integrated  circuit  assemblies  comprise  limiting  constraints  on 
the  performance  relative  to  cost  of  the  entire  system.  Future 
systems  of  these  types,  which  are  now  being  planned,  would  be 
infeasible  without  digital  processing  circuitry  far  more  advanced 
(higher  throughput  capacity  per  chip)  than  those  in  current 
military  or  commercial  use. 

The  computationally  intensive  subsystems  include  synthetic 
aperture  radar  (SAR)  processing,  inverse  SAR  (or  PROFILE,) 
processing,  acoustic  beam  forming,  ELINT,  image  processing, 
adaptive  antenna  arrays,  communications  and  navigation.  Some 
of  the  algorithmic  processes  which  contribute  to  the  high 
computational  throughput  are  for  the  purposes  of  spectral 
analysis,  encryption,  error  correction  coding  and  decoding. 
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coordinate  transformation,  voice  abstraction  and  synthesis, 
signal  conditioning,  etc. 

The  throughput  capacities  (total  gate  cycles  per  second) 

for  various  subsystems  are  estimated  from  fundamental  analysis; 

the  overall  system  throughput  capacities  for  future  systems  is 
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placed  at  a  few  billion  operations  per  second  (over  10  gate 
cycles  per  second).  In  order  to  relate  these  throughput 
requirements  to  systems  support  costs--which  are  governed 
principally  by  size,  power  consumption,  weight,  and  failure 
rate — a  fairly  basic  analysis  is  presented  of  MOS  circuit 
throughput  in  relation  to  minimum  feature  size,  showing  the 
necessity  for  one  micron  (or  smaller)  design  rules  to  meet  the 
stated  performance  objectives. 

Several  specific  algorithmic  functions  are  identified  as 
the  source  of  the  high  computational  rates.  These  include  the 
FFT  "butterfly";  the  CORRELATE  operation--the  arithmetic  sum  of 
the  bit-by-bit  exclusive  OR  of  two  binary  sequences;  the 
evaluation  of  continuous  functions  such  as  the  logarithm, 
exponential  and  trigonometric  functions;  self-ordering  memories 
(sort  and  merge);  and  associative  memories. 

It  is  recommended  that  circuits  such  as  these  be  developed 
as  standard  hardware  macros. 
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I.  SUMMARY,  CONCLUSIONS,  AND  RECOMMENDATIONS 

The  objective  of  this  program  was  to  estimate  future  mil¬ 
itary  needs  for  advanced  integrated  circuits  and  to  identify 
specific  integrated  circuit  functions  [at  the  very  large  scale 
level  of  circuit  integration  (VLSI)]  which  could  satisfy  a  broad 
range  of  DoD  military  system  needs.  Circuits  embodying  such 
functions  might  then  be  developed  and  produced  as  standard 
devices  in  large  volume.  The  purpose  is  to  provide  information 
for  support  of  the  Very  High  Speed  Integrated  Circuit  (VHSIC) 
program. 

A  survey  of  military  digital  equipments  currently  in  use 
and  of  experimental  and  developmental  programs  involving  sig¬ 
nificant  quantities  of  digital  processing,  indicates  rapid 
growth  in  the  use  c$f  integrated  circuit  assemblies.  For 
example,  the  avionics  of  future  advanced  early  warning  (AEW) 
and  antisubmarine  warfare  (ASW)  aircraft  systems  now  under 
consideration  would  require,  in  toto,  throughput  capacities  of 
several  billion  computer  operations  per  second,  which  is  an 
order  of  magnitude  greater  than  the  total  capacity  of  current 
computationally  intensive  aircraft  systems  (such  as  the  E-2C). 
Comparable,  if  not  greater,  throughput  capacities  are  being 
considered  for  satellite  surveillance  systems,  advanced  acoustic 
surveillance  systems,  and  synthetic  aperture  radars  for  auton¬ 
omous  tactical  aircraft  systems.  Systems  such  as  these  would 
be  infeasible  without  digital  processing  assemblies  far  more 
advanced  (higher  throughput  per  chip)  than  those  in  current 
military  or  commercial  use. 


In  some  experimental  or  proposed  military  systems  (category 
1)  the  integrated  circuit  (IC)  assemblies  play  an  essential  role 
in  the  mission  of  the  entire  system  and  the  life  cycle  system 
support  costs  are  high;  this  includes  the  attack  submarine; 
autonomous,  standoff,  tactical  aircraft  systems;  advanced  VSTOL, 
ASW  and  AEW  aircraft;  surveillance  and  communications  satellites; 
missiles  and  submunitions  of  various  types.  The  subsystems 
which  account  for  the  high  data  processing  throughput  are  radar, 
particularly  synthetic  aperture  radar  (SAR),  and  inverse  SAR 
(or  PROFILE);  acoustic  beam  forming,  spectral  analysis,  and 
target  signature  identification;  ELINT;  image  processing; 
adaptive  antenna  arrays. 

There  is  another  group  of  subsystems  (category  2)  consisting 
largely  of  IC  assemblies  whose  feasibility  does  not  hinge  on  the 
availability  of  higher  throughput  circuits,  but  which  are 
produced  in  large  quantities  and,  in  the  aggregate,  comprise  an 
important  economic  incentive  for  the  introduction  of  more 
advanced  circuits.  This'  group  includes  the  joint  tactical 
information  distribution  system  (JTIDS),  global  position 
satellite  (GPS)  receivers,  speech  abstraction  and  synthesis 
circuits,  general-purpose  computers,  IFF. 

The  major  conclusions  of  this  study  are  these: 

(1)  Advanced  integrated  circuit  technology  (smaller 
than  one  to  two  micron  feature  size)  appears  to 
be  economically  justified  if  not  essential  for 
the  category  1  military  applications; 

(2)  All  categories  of  military  applications  require, 
or  would  materially  benefit  from,  certain  circuit 
types  which  are  not  now  being  produced  commercially; 
and 

(3)  Greater  throughput  capacity  can  often  be  achieved 
through  the  use  of  specialized  hardware  macros 

(see  Section  IV  B,  The  Macro  Substitution  Sequence). 
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t  The  recommendations  of  this  study  are  that: 

(1)  A  group  of  hardware  "macros"*  be  developed, 
initially  in  the  form  of  single  chips  or 
hybrids,  for  use  with  currently  available 
microprocessors  and  microprogrammable  bit 
slices;  when  more  refined  lithographic 
methods  are  perfected,  these  functions  could 
be  transferred  onto  the  processor  chip 
(functional  partitioning).  Several  specific 
functions  have  been  suggested,  such  as  a 
one-cycle  function  calculator  (SINE,  EXP, 

LOG,  etc.);  a  self-ordering  memory  (auto¬ 
matic  sort  and  merge);  an  interpolator;  an 
PFT  "butterfly",  etc. 

(2)  Any  general-purpose  programmable  processor 
developed  under  government  funding  should 

a.  Contain  high  throughput  features,  such 
as  two-argument  machine  instructions  imple¬ 
mented  by  two-port  register  stacks;  parallel 
fetch,  decode,  and  execute  cycles;  separate 
data  and  instruction  busses;  pipelined 
processing. 

b.  Execute  the  CORRELATE  instruction  (the 
numerical  sum  of  the  bit-by-bit  exclusive  OR). 

(3)  Advanced  forms  of  field  programmable  logic  cir¬ 
cuits  should  be  developed. 

These  recommendations  are  not  meant  to  exclude  other  can¬ 
didate  circuits. 


*The  term  "macro"  in  the  computer  literature  usually  signifies 
a  subroutine,  l.e.,  a  block  of  instructions  or  an  algorithm 
frequently  needed;  a  hardware  macro  is  an  IC  or  small  assembly 
of  ICs  which  executes  a  macro--the  single-chip  multiplier  being 
the  preeminent  example. 
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The  availability  of  the  recommended  hardware  macros  would 
extend  the  capabilities  of  the  microprocessor  and  the  bit  slice 
components  into  many  of  the  higher-performance  radar  and  sonar 
signal  processing,  communications,  and  ELINT  applications  while 
preserving  and  building  upon  existing  assets  of  software  and 
engineering  expertise.  Since  these  types  of  circuits  are 
traditionally  supplied  by  the  IC  manufacturers  (rather  than 
being  produced  by  vertically  integrated  systems  suppliers), 
this  approach  has  the  further  merit  that  the  circuits  would  be 
available  to  all  military  equipment  suppliers. 

This  approach  is  an  evolutionary  one,  adding  significantly 
to  the  capability  of  existing  technology  while,  as  more  refined 
IC  technology  becomes  available,  these  macros  could  be  embodied 
on  fewer  chips  and  eventually  be  integrated  onto  the  processor 
chip  itself  (functional  partitioning). 

The  selection  of  integrated  circuit  functions  ("product 
definition"  in  commercial  terms)  involves  a  considerable  variety 
of  factors,  technical,  economic,  and  institutional.  The  OVER¬ 
VIEW  (Section  II)  attempts  to  review  these  considerations,  and 
qualifies  the  preceding  conclusions  and  recommendations. 
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II.  OVERVIEW 

A.  CIRCUIT  SPEED,  THROUGHPUT  CAPACITY,  AND  SYSTEMS  SUPPORT 
COSTS 

The  physical  and  economic  attributes  of  microelectronic 
integrated  circuits  (IC)  so  exceed  those  of  the  discreet  tran¬ 
sistor  components  as  to  have  far-reaching  effects  on  military 
system  technology.  Although  integrated  circuits  already  per¬ 
vade  military  electronic  systems,  further  major  improvements  in 
military  systems  capability  relative  to  cost  are  possible 
(Ref.  1),  even  within  the  limits  of  current  IC  practice,  while 
progress  in  the  underlying  IC  technology  still  has  far  to  go 
(Ref.  2). 

The  potential  contribution  of  integrated  circuit  tech¬ 
nology  to  the  production  cost,  reliability,  size,  and  weight 
of  military  electronic  equipment  are  frequently  cited,  but  in 
many  applications  the  systems  support  costs  (the  marginal  life- 
cycle  cost  of  the  entire  system  attributable  to  its  integrated 
circuit  assemblies)  are  of  even  greater  significance  (Ref.  3) 
and  it  is  in  military  systems  such  as  advanced  fighter  aircraft, 
surveillance  satellites,  attack  submarines,  missiles  of  all 
types,  etc.,  where  systems  support  costs  are  greatest  that  the 
large-scale  Integrated  circuit  finds  its  greatest  potential 
contributions.  In  fact,  In  some  military  equipment  (Section  B, 
below)  the  physical  characteristics,  performance,  or  total 
contribution  to  life-cycle  costs  of  the  integrated  circuit 
assemblies  comprise  limiting  constraints  on  the  performances 
relative  to  cost  of  the  entire  system  in  which  they  are  embedded. 
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On  the  whole,  the  potential  for  cost  avoidance  or  perform¬ 
ance  improvement  through  the  application  of  current  integrated 
circuit  technology  has  been  estimated  at  several  billion  dollars 
for  weapons  systems  and  intelligence-gathering  systems  which 
are  now  in  advanced  development  or  in  the  early  stages  of 
production.  For  these  systems,  the  greatest  net  cost  savings 
would  be  achieved  (using  current  technology)  with  circuits 
having  an  average  level  of  integration  of  several  hundred  logic 
gates  (Refs.  3,  4),  corresponding  to  about  5  micrometer  (ym) 
feature  sizes.  But  even  while  military  systems  designers  and 
procurement  groups  experience  difficulty  in  exploiting  the  full 
benefits  of  current  IC  practice,  the  manufacturing  technology 
for  faster  and  more  densely  integrated  circuits  rapidly 
approaches  maturity  (Refs.  5,  6). 

The  objectives  of  this  study  are  to  examine  the  potential 
military  applications  of  the  high-performance  integrated  cir¬ 
cuits  which  could  be  produced  using  advanced  manufacturing 
technology  (specifically  submicron  feature  sizes)  and  recommend 
circuits  "that  could  satisfy  a  broad  range  of  DoD  military 
systems  needs,"  but  are  not  expected  to  be  commercially 
available. 

This  is  a  pivotal  issue,  that  of  "product  definition"  (in 
commercial  terms);  what  economically  feasible  functions  can  be 
defined  for  circuit  integration  at  the  100,000  gate  level? 

The  commercial  semiconductor  industry  examines  this  prospect 
and  notes  that  most  present  commercial  circuits,  except  for 
memory  devices*,  are  not  technology-limited  in  either  speed  or 
circuit  integration  and,  except  for  the  32-bit  processor  (and 


*Memory  devices,  on  the  whole,  are  the  object  of  vigorous 
commercial  effort  and  are  therefore  generally  excluded  from 
consideration  under  this  study. 
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the  l6-bit  processor  with  more  on-board  memory),  no  useful 
non-memory  products  have  been  defined  for  the  more  advanced 
technology. * 

This  may  be  true  for  the  commercial  sector,  but  when  ad¬ 
vanced  military  systems  concepts  are  examined,  many  appli¬ 
cations  emerge  in  which  the  most  advanced  IC  technology  could 
be  exploited  to  advantage.  Specific  examples  of  such  military 
products  include  a  parallel  pipeline  processor,  with  on-board 
hardware  multiplier  and  sequencers,  capable  of  executing 
certain  logic  operations  encountered  in  message  coding  and 
decoding;  special  image  data  processors;  associative  (content 
addressable)  memory  elements;  frequency  synthesizers;  frequency, 
phase,  and  code  sequence  acquisition  circuits;  correlators, 
and  digital  matched  filters  with  high  processing  gain.  Other 
circuits  which  are  not  now  produced  commercially  but  would  be 
useful  in  military  equipment  include  hardware  "macros"  (four 
of  which  are  suggested  as  candidates  for  development;  a  self¬ 
ordering  memory  stack,  a  universal  function  calculator,  an 
interpolation  circuit,  and  an  FFT  butterfly),  and  a  general 
logic  circuit.  All  of  these  circuits  are  expandable  in  speed 
and  density  as  technology  permits.  The  systems  applications 
of  these  and  other  military  IC  products  are  discussed  below. 

In  assessing  the  need  for  advanced,  high-performance  cir- 
cuits  (i.e.,  high-speed,  ~  100  MHz,  and  circuit  density,  10 
gates),  those  applications  (such  as  real  time  electronic  syn¬ 
thetic  aperture  radar  processing)  which  demand  the  highest 
arithmetic  throughput  rate  are  the  first  to  come  to  mind.  But, 
actually,  all  military  systems  which  contain  IC  assemblies  stand 
to  benefit  from  higher-speed  ICs,  since  the  throughput  capacity 


^Gordon  Moore,  Ref.  7-  Also,  Robert  W.  Keyes,  Ref.  8,  sees  no 
commercial  need  for  1.5  pm  circuits  at  the  present  time. 
"Market  economics  do  not  justify  these  circuits  in  the  time 
frame  of  the  VHSI  program."  See  also  Ref.  9. 
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per  circuit  (the  product  of  cycle  frequency  and  number  of 
gates),  more  than  any  other  single  factor,  influences  the 
system  support  costs  for  a  given  application.  As  it  happens, 
integrated  circuits  at  1  pm  feature  sizes  with  the  high  through¬ 
put  capacities  are  fast  by  present  standards  (Section  V). 

In  general,  future  improvements  in  circuit  speed  and 
throughput  capacity  will  have  the  greatest  effect  on  perform¬ 
ance  or  cost  for  applications  which  require  high  throughput 
(particularly  where  systems  support  costs  are  greatest)  or 
where  large  volumes  of  circuits  will  be  used. 

B.  HIGH  PERFORMANCE,  HIGH  SYSTEM  SUPPORT  COST  APPLICATIONS 

The  attack  submarines*,  various  advanced  tactical  air¬ 
craft  systems**,  and  certain  surveillance  satellites  exemplify 
military  systems  of  potentially  critical  importance  whose 
capabilities  are  dependent  on  massive  digital  processing  (of 
sensor  data)  and  in  which  systems  support  costs  are  relatively 
high.  These  applications  have  generated  integrated  circuit 
equipment  development  programs  and  are  frequently  cited  as 
justification  for  government  funding  of  the  most  advanced 
integrated  circuit  production  techniques.  In  fact,  the  various 
efforts  to  develop  real  time  high-resolution  radar  surveillance 
data  processors  for  autonomous,  stand-off,  air-to-ground  weapon 
delivery  systems  against  tanks  or  other  ground  vehicles  probably 
cannot  result  in  deployable  equipment  until  more  highly  inte¬ 
grated  circuits  become  available  (see  Section  III  B). 

The  signal-  and  data-processing  system  currently  in 

production  or  development  for  the  Trident  and  attack  submarines 
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contains  roughly  10'  equivalent  gates  of  logic  and  the  total 
deployment  is  currently  placed  at  60  to  100  systems  for  a  total 
of  10^  or  more  equivalent  gates  of  circuitry,  including 


*The  SSN-637  and  SSN-688. 

**--such  as  Assault  Breaker,  ATSR,  TAWDS. 
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life-cycle  spares.  However,  this  equipment  fills  the  available 
deck  space  but  still  can  process  only  a  small  fraction  of  the 
data  incoming  from  the  acoustic  arrays  (Section  III  A).  The 
advanced  sonar  array  concepts*  now  in  the  experimental  stage, 
if  deployed,  would  require  about  an  order  of  magnitude  more 
circuitry--a  commercially  significant  quantity. 

The  total  potential  demand  for  integrated  circuit  assem¬ 
blies  in  tactical  aircraft  systems  is  comparable.  If  the 
operational  tests  eventually  show  that  tactical  ground  targets 
can,  in  fact,  be  reliably  detected  and  classified  from  useful 
ranges  by  synthetic  aperture  radar  (SAR),  with  a  resolution  of 
a  foot  or  so,  the  potential  demand  would  be  a  few  thousand 
systems,  each  containing  a  few  million  gates. 

The  F-2C,  E-3A,  P-3,  and  S-3  are  other  aircraft  systems 
which  contain  conspicuous  quantities  of  signal-  and  data- 
processing  equipment  and  update  programs  are  either  planned  or 
in  progress  for  all  four. 

Weapon  delivery  is  the  "bottom  line"  of  (offensive)  mili¬ 
tary  systems,  and  in  torpedoes,  missiles  of  all  categories, 
bombs,  and  even  shells,  data  processing  IC  assemblies  are 
becoming  more  elaborate  because  the  improvement  in  accuracy  and 
probability  of  successful  intercept  more  than  offsets  the  space 
given  over  to  more  refined  control  and  guidance  circuits  which 
would  otherwise  be  used  for  explosive.  For  example,  numerous 
programs  are  underway  to  improve  the  capabilities  of  air-to-air 
missiles  through  more  elaborate  signal  and  data  processing, 
including  higher  doppler  resolution  for  better  tracking  and 
clutter  rejection,  more  refined  tracking  algorithms  to  prevent 
track  breaking  by  crossing  targets,  side  lobe  clutter  rejection 
circuits,  wider  bandwidth  active  fusing,  reduced  jammer 
vulnerability,  and  so  on  (Refs.  10,  11). 


*Such  as  the  Advanced  Autonomous  Array. 


9 


Further,  several  forms  of  autonomous  terminal  homing  tech¬ 
niques  are  under  development  in  which  (active  or  passive) 
optical  or  IR  sensor  data  are  compared  with  abstractions  from 
long-range  (sometimes  steroptical)  photographic  reconnaissance 
and  cartographic  data  (Ref.  12).  Although  further  algorithmic 
development  is  needed,  these  approaches  promise  to  provide  high 
precision  ("zero  CEP")  terminal  guidance,  at  least  against  fixed 
targets.  The  guidance  is  computationally  intensive  and  the 
systems  support  costs  comparatively  high.  The  value  of  high- 
precision  terminal  homing  in  terms  of  missile  payload  stems 
from  the  disproportionate  increase  in  the  quantity  of  explosive 
needed  to  destroy  a  command  post  (for  example)  relative  to  the 
CEP  (Ref.  12). 

Even  more  extreme  demands  for  signal  processing  and  compu¬ 
tational  capacities  are  found  in  satellite-borne  radar  and 
electrooptical  (EO)  systems  (Ref.  13)--where  the  largest  systems 
support  cost  coefficients  would  also  be  encountered. 

C.  WHERE  LARGE  QUANTITIES  ARE  NEEDED 

The  Global  Position  Satellite  (GPS)  receiver.  Joint  Tac¬ 
tical  Information  Distribution  System  (JTIDS)  terminals*,  and 
the  Digital  Voice  Terminal  (ANDVT)  are  examples  of  digital 
processing  equipment  in  which  the  eventual  demand  for  large 
quantities  rather  than  system  support  costs  is  the  primary 
justification  for  an  investment  in  more  highly  integrated  cir¬ 
cuits,  since  unit  cost  of  the  integrated  circuit  components  is 
now  a  deterrent  to  their  full-scale  utilization.  Man-pack 
versions  of  all  of  these  types  of  equipment  are  planned  or 

actually  under  development,  with  typical  design  goals  of  6  W, 

•3 

5  lb.,  and  100  in  ,  which  clearly  necessitates  the  use  of  LSI 
circuits  throughout  (Ref.  1,  p.  11-143)**. 


* — and  PLRS,  the  corresponding  ARMY  equipment. 

**The  Magnavox  Manpack  (GPS)  weighs  30  lb.  (excluding  batteries), 

O 

consumes  29  W,  and  occupies  38^  in  . 
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The  Linear  Predictive  Coding  algorithm  for  voice  abstrac¬ 
tion  is  arithmetic-intensive  and  cannot  be  executed  in  real 
time  on  microprocessors  currently  in  production,  but  the  huge 
commercial  market  that  is  thought  to  exist  for  these  devices 
when  they  can  at  last  be  embodied  on  a  single  chip  (by  flow- 
tnrough  processing)  is  regarded  as  a  major  incentive  for  the 
commercial  development  of  more  refined  lithographic  production 
methods . 

This  group  of  equipments  probably  comprises  the  largest 
source  of  demand  for  "next  generation"  integrated  circuits 
(perhaps  as  much  as  several  hundreds  of  thousands  of  pieces  of 
equipment,  a  few  million  integrated  circuits). 

D.  OTHER  APPLICATIONS 

Integrated  circuit  assemblies  also  occur  in  numerous 
categories  of  military  equipment  where  neither  the  direct  costs, 
systems  support  costs,  nor  integrated  circuit  performance  is 
regarded  as  a  limiting  consideration.  In  many  of  these  appli¬ 
cations,  the  total  cost  and  overall  physical  characteristics 
are  dominated  by  other  than  the  digital  portion  of  the  system. 
Nevertheless,  at  high  levels  of  circuit  integration  useful 
aggregate  cost  savings,  performance  improvements  and,  especially, 
lower  failure  rates  could  be  effected.  We  single  out  as 
examples  three  categories  of  such  equipment--communications , 

ELINT,  and  general-purpose  computers. 

Pseudo  noise  (PN),  sequence-coding,  and  spread-spectrum 
techniques  are  widely  used  to  make  communications  secure 
against  interception  and  decoding,  IFF  and  radar  secure  against 
spoofing,  and  all  three  less  vulnerable  to  ECM  and  non-malicious 
interference.  Error  correction  coding  is  often  used  to  prevent 
loss  of  data  from  intermittent  fading  and  interference  (for 
example,  in  JTIDS,  which  operates  in  the  same  frequency  band  as 
TACAN).  At  the  receiving  end,  the  PN  sequence  must  be  regenerated 
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and  synchronously  applied  to  the  received  signal  (code  strip¬ 
ping),  usually  with  the  aid  of  special  frequency  and  phase 
acquisition  circuits  (such  as  Doppler  wipe-off),  and  finally, 
the  syndrome  and  error  correction*  must  be  calculated.  All  of 
these  operations  are  performed  by  integrated  circuit  assemblies. 
In  many  cases,  custom  LSI  circuits  have  been  developed,  although 
the  digital  synthesizer  and  Reed-Solomon  decoder  for  some  appli¬ 
cations  can  be  emulated  sufficiently  by  the  more  advanced 
microprocessors  (Motorola  96000). 

The  intrinsic  security  and  potential  resistance  to  jamming 
(A/J)  of  radio  communications  are  both  related  to  the  processing 
gain--which  is  essentially  the  time  bandwidth  product  of  the 
coded  transmission  per  message  bit.  In  the  commonest  tactical 
applications — such  as  SEEKTALK  or  SINCGARS  for  voice  and  low 
data  rate  communications  between  ground  forces  or  between  air¬ 
craft  and  ground  control--the  signal  bandwidths  are  constrained 
to  10  MHz  or  so  by  propagation  defects,  practical  limitations 
related  to  antenna  and  amplifier  efficiency,  and  regulations 
governing  spectral  utilization.  The  data  rates  may  be  as  high 
as  16  kb/s,  for  continuously  variable  slope  delta  modulation 
(CVSD)  of  voice  transmission,  which  effectively  limits  the 
processing  gain  to  30  dB  or  so.  Processing  gains  of  40  dB  are 
rarely  used. 

Although  all  of  the  services  own  large  inventories  of  UHP 
and  VHF  equipment,  these  have  only  limited  A/J  capability  or 
potential.  For  this  reason,  among  others,  several  programs  are 
in  progress  to  develop  communication  satellite  systems  (FLTSAT, 
LEASAT,  NASP,  NCFS,  DCS  II,  etc.)  which  would  provide  a  far 
greater  utilizable  spectrum  and  hence  processing  gain.  The 
larger  processing  gain,  in  turn,  implies  more  signal  processing 


*Two  steps  are  often  Identified  in  error  correction:  the  syn¬ 
drome  calculation  in  which  the  presence  of  errors  Is  determined, 
and  the  actual  error  correction  Itself. 
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per  message  symbol.  This  and  the  high  systems  support  costs 
for  satellites  place  a  much  greater  premium  on  reduced  feature 
size  in  the  integrated  circuits.  Designers  of  this  type  of 
equipment  can  be  expected  to  utilize  the  fastest  and  highest- 
capacity  circuits  available  within  their  bandwidth  limitations. 
Several  advanced  circuits  are  now  under  development  for  these 
purposes  (see  Section  III). 

On  the  other  side  of  electronic  warfare  (ELINT),  signal 
interception  and  analysis  circuit  requirements  reach  their 
extremes  in  bandwidth  and  throughput  capacity. 

Aircraft  are  exposed  to  electromagnetic  emanations  from 
satellites  and  other  aircraft,  and  from  ground  stations  within 
an  area  of  a  half-million  square  miles  or  so.  The  total  pulse 
rate  in  all  bands  reaches  10^/sec  over  heavily  culturated  areas. 
The  first  step  in  sorting  out  these  emanations  is  that  of 
association,  e.g.,  identifying  pulse  trains  from  a  given  trans¬ 
mitter  (on  the  basis  of  pulse  modulation,  carrier  frequency, 
direction,  and  so  on).  This  associative  process  can  be 
materially  facilitated  by  special  content-addressable  memory 
circuits  (such  as  the  INTEL  3104,  High-Speed  16-bit  Content- 
Addressable  Memory),  but  far  more  powerful  circuits  of  this  type 
are  needed  in  addition  to  various  other  forms  of  self-organizing 
memory  systems  (see  below). 

Secure  communications  and  signal  intercept  and  analysis 
build  on  the  same  technology,  and  the  progression  of  this  tech¬ 
nology  fuels  a  ceaseless  struggle  to  develop  more  effective  low 
orobability  of  intercept  (LPI)  transmitters--through  the  use  of 
spread  spectrum,  time  diversity,  and  more  secure  coding-~and, 
on  the  other  side,  signal  intercept  and  analysis  equipment  to 
keep  pace.  In  this  situation,  IC  design  and  production  costs 
and  systems  support  costs  are  secondary  considerations,  the 
primary  motivation  being  performance. 


E.  THE  PROGRAMMABLE  GENERAL-PURPOSE  PROCESSOR 


The  preceding  sections  dealt  with  the  effect  of  micro¬ 
electronics  on  systems  support  cost  (the  first  category)  and  on 
unit  costs  for  high-volume  usage  in  the  second;  the  performance 
and  physical  characteristics  and  cost  benefits  have  been  esti¬ 
mated  for  both.  There  is  yet  another  important  avenue  to 
enhanced  system  performance  in  relation  to  cost,  namely  through 
the  versatility  of  the  general-purpose  programmable  computer 
which  can  perform  a  large  range  of  functions--either  on  a 
dedicated  or  time-shared  basis--each  of  which  would  otherwise 
be  performed  by  dedicated  special-purpose  equipment;  functions 
such  as — for  example--navigation,  fire  control,  communications, 
engine  and  fuel  control,  aircraft  flight  control,  etc.  The 
merits  of  programmable  processors  are  now  undisputed,  but  the 
degree  to  which  this  possibility  has  been  exploited  in  military, 
industrial,  commercial,  and  consumer  uses  could  not  easily  have 
been  imagined  before  the  introduction  of  the  microprocessor. 

In  many  applications,  microprocessors  are  embedded  in 
systems  (aircraft,  missiles,  etc.),  to  serve  as  special-purpose 
controllers  or  dedicated  processing  circuits.  In  the  lower  end 
of  the  performance  spectrum,  the  commercial  circuits  are  coming 
into  widespread  use  and  are  of  considerable  benefit  to  military 
systems.  Furthermore,  in  recent  years,  the  integrated  circuit 
manufacturers  have  shown  a  growing  interest  in  high-performance 
LSI  (the  AMD-2901  series,  Texas  Instruments  48l,  and  INTEL  3000 
series,  the  Fairchild  10K  ECL  series,  and  more  recently,  the 
Fairchild  ECL  8,  all  of  which  are  bit-slice  circuits.  Ref.  l4). 
Moreover,  the  capabilities  of  microprocessors  (TI  SBP  9900, 

INTEL  80 86,  Motorola  96000)  have  now  advanced  to  the  threshold 
(l6-bit  words,  5  MHz  clock)  of  high-performance  applications. 

These  two  distinct  forms  of  general-purpose  processor--the 

p 

single  chip  (MOS,  I  L  and  CMOS)  microprocessor,  and  the  bit 
slice  families  of  bipolar  circuits — derive  their  characteristics 
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from  their  underlying  integrated  circuit  technologies;  and  the 
future  progress  of  these  circuit  technologies  will,  in  turn, 
affect  the  growth  in  capabilities  of  each  of  these  classes  of 
processor  and  their  relative  future  importance  (see  Subsection 
I,  below). 

The  single-chip  microprocessor  dominates  the  commercial 
market  and  those  military  applications  for  which  their  through¬ 
put  is  adequate;  while  the  bipolar  microprogrammable  bit  slice 
families  (such  as  the  2901  TTL  and  the  10K  ECL  series)  must  be 
used  in  high-performance  applications.  The  bipolar  bit  slice 
processors  are  capable  of  three  to  ten  times  the  throughput 
rate  of  microprocessors  because  of  the  higher  intrinsic  speed 
of  the  bipolar  circuits  and  their  microprogrammability.  The 
latter  gives  the  system  designer  direct  access  to  all  the 
processor's  control  points  and  hence  the  greatest  operational 
flexibility,  but  the  instruction  word  becomes  correspondingly 
longer  (often  80  to  100  bits  as  compared  to  8  in  the  micro¬ 
processor),  and  the  overall  programming  cost  per  instruction  is 
correspondingly  greater.  However,  blocks  of  microcode  (soft¬ 
ware  "macros" )  have  been  developed  which  find  repeated  appli¬ 
cations,  and  computer-aided  programming  systems  exist. 

Over  the  foreseeable  future,  the  growing  use  of  both  of 
these  forms  of  programmable  processors  in  military  systems  seems 
desirable  if  not  inevitable  from  the  following  considerations: 

(1)  In  general,  the  programmable  processor  components 
(ALU,  sequencers,  MPY,  ROM,  etc.)  have  kept  well 
abreast  of  current  technology,  and  their  speed, 
word  size,  physical  characteristics,  etc.,  can  be 
further  extended  and  improved  as  circuit  tech¬ 
nology  permits; 

(2)  Advanced  forms  of  programmable  processors  with 
richer  repertoires  of  instructions  (see  Section 
III  D)  would  be  capable  of  taking  over  most  of 
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the  functions  now  performed  by  special  IC 
assemblies  (Refs.  15,  16 ) ; 

(3)  The  computational  power  of  both  the  micro¬ 
processor  and  the  microprogrammable  bit 
slices  could  be  substantially  enhanced  by 
the  development  of  a  larger  set  of  hardware 
macros ; 

(4)  The  body  of  extant  software  and  the  teams  of 
development  engineers  skilled  in  the  use  of 
these  circuits  represent  a  substantial  asset 
of  the  military  electronics  industry; 

(5)  These  circuits  would  be  available  to  all 
military  equipment  suppliers. 

Progress  in  lithography  and  other  aspects  of  chip  design 
and  manufacturing  will  provide  the  means  for  producing  chips  at 
the  100,000  gate  level  of  complexity  operating  at  100  MHz  or 
faster.  This  opens  up  an  exciting  prospect  of  very  high 
throughput  rates.  The  chip  architectural  features  of  program¬ 
mable  general-purpose  processors  which  might  fully  exploit  this 
capability  have  been  energetically  discussed  and  debated,  but 
all  proposed  approaches  are  notable  for  the  use  of  several 
layers  of  pipelining  and  concurrent  operation  which  wc aid 
necessitate  the  development  of  entirely  new  software  systems. 
Although  tne  physical  characteristics  of  the  processor  elements 
and  hardware  macros  might  be  Improved  without  altering  chip 
architecture  or  software,  substantial  increase  in  speed  cannot 
be  achieved  without  also  decreasing  the  interchip  signal  delay, 
either  by  lowering  the  voltage  swings  at  the  board  level  or  by 
using  transmission-line  techniques. 

F.  "WHO  CAN  SAY  WHAT  IS  POSSIBLE?"  -  Kepler 

New  technology  creates  its  own  applications  and  very  likely 
future  advances  in  microelectronics  will  find  important  military 
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applications  which  are  not  now  foreseen.  Microelectronics 
technology  plays  a  central  role  in  a  process  regarded  by  some 
as  one  of  the  principal  developments  in  man's  cultural  evolu¬ 
tion,  a  process  remarkably  foretold  by  Norbert  Wiener  (Ref.  17), 
which  is  really  nothing  less  than  the  externalization  of  human 
intelligence,  and  "VLSI  is  the  key  to  ...  machine  intelligence" 
(Ref.  18).  The  eventual  consequences  of  this  process  with 
respect  to  the  instruments  and  practices  of  war  cannot  be  under¬ 
estimated  with  impunity.  Already,  integrated  circuit  assemblies 
figure  prominently  in  the  effective  application  of  military 
force--in  the  location  and  identification  of  an  enemy's  military 
resources,  weapon  delivery,  damage  assessment,  and  reporting. 

G.  COMMONALITY  AND  THE  MILITARY'S  PECULIAR  NEEDS  FOR  ICs 

The  guiding  star  of  the  IC  industry  has  been  high  production 
volume  (due,  in  part,  to  the  critical  effect  of  process  control 
on  yield),  and  one  factor  which  determines  the  total  demand  for 
a  circuit  is  commonality--the  number  of  different  applications 
for  which  it  is  suitable.  The  economic  importance  of  commonality 
actually  begins  with  circuit  design  and  development  and,  for  the 
military  at  least,  extends  into  life-cycle  logistics  and  opera¬ 
tional  support  costs  where  it  affects  parts  count,  special  test 
equipment,  personnel  training,  and  hence  logistics  and  opera¬ 
tional  support  failure  rates.  Searching  out  the  possible  basis 
of  commonality  in  military  IC  applications  is  a  difficult  task 
(for  which  the  IC  manufacturer  usually  lacks  the  necessary 
background).  Per  example,  the  Standard  Avionics  Modules  (SAM) 
and  MFBARS  efforts  by  the  Air  Force  Avionics  Laboratory  address 
the  design  of  a  common  processor  or  modem  set  for  GPS,  JTIDS, 
TACAN,  possibly  narrow  band  HF,  VHF,  UHF  systems  (Ref.  16),  air 
traffic  control,  navigation,  and  instrument  landing  systems 
(GLIDESLOPE,  VOR/ILS,  ADF,  transponder,  etc.).  Other  studies 
address  the  design  of  a  general-purpose  avionics  signal  proc¬ 
essor  (Ref.  19)  and  advanced  processors  for  a  variety  of 
satellite  applications  (Ref.  13). 
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Military  signal  processing  makes  special  and  extreme 
demands  beyond  those  of  any  other  known  application  and  the 
overriding  importance  of  commonality  points  steadfastly  in  the 
direction  of  programmable  signal  processing  (Ref.  20,  pp.  4l, 

44).  Such  equipment  already  exists  (the  Advanced  Signal 
Processor,  for  example),  embodied  mostly  in  MSICs,  and  several 
programs  are  under  way  to  develop  military  equipment  using  LSI 
technology  (the  improved  spectrum  analyzer  for  the  BQQ-5),  and 
advanced  circuitry  (MVP),  and  to  study  the  architectural 
features  suitable  for  future  systems  (AOSP). 

The  microprocessor  is  a  triumph  of  commonality,  but  for 
the  high-performance  applications  which  require  bipolar  tech¬ 
nologies  that  (until  now)  precluded  integration  above  a  few 
hundred  gates,  nothing  comparable  has  emerged;  instead,  a 
family  of  special-purpose  bit  slices  (ALUs,  sequencers), 
various  gate  arrays,  and  field  programmable  gate  arrays  (FPLA) 
are  available.  As  yet,  commonality  in  the  high-performance 
technologies  has  eluded  us.* 

The  gate-array  (or,  generically,  the  master-slice)  circuits 
consist  of  fixed  patterns  (arrays)  of  logic  gates  on  a  substrate, 
a  standard  circuit  except  for  the  final  layer  or  two  of 
customized  interconnections.  To  be  sure,  the  circuit  density 
falls  short  of  the  comparable  customized  circuit  (often  by  one 
half),  but  the  cost  and  schedule  are  smaller  by  a  factor  of  ten 
(typically)  and  software  tools  (including  "macros")  facilitate 
the  translation  of  logic  designs  to  interconnect  patterns. 

However,  the  master-slice  resources  generally  cannot  be 
utilized  fully  and  when  larger  gate  arrays  (1000  or  more  logic 
gates)  are  attempted,  several  layers  of  interconnects  are  needed 
and  the  available  number  of  I/O  pins  may  become  restrictive. 


•"LSI  ...  is  slow  to  penetrate  random  logic.  How  will  LSI 
handle  its  major  challenge:  customize  chips  for  logic  appli¬ 
cations?"  (Ref.  22 .  ) 
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♦  For  these  reasons,  a  study  had  previously  been  undertaken 

(Ref.  21)  to  find  a  circuit  configuration  which  could  be  pro¬ 
grammed  (either  by  loading  program  registers  with  the  proper 
bit  sequences,  or  by  blowing  fuse  links)  to  emulate  arbitrary 
digital  functions.  That  study  led  to  several  conceptual 
designs,  designated  as  General  Logic  Circuits. 

The  general  logic  circuit  contains  more  powerful  program¬ 
mable  logic  elements  (embodying  canonical  functions  of  five  or 
six  variables)  and  a  fixed  interconnect  pattern  with  program¬ 
mable  cuts  and  ties.  The  use  of  such  logic  elements  in  place 
of  simple  gates  eliminates  (typically)  two-thirds  of  the  inter¬ 
connections  that  would  be  required  by  an  equivalent  gate  array, 
thus  extending  the  level  of  integration  that  could  be  reached 
before  the  use  of  multiple  layers  of  interconnects  becomes 
imperative.  It  has  been  observed  that  chip  design  costs 
increase  very  rapidly  with  chip  complexity  (Refs.  9,  6)  and 
because  of  this,  and  the  interconnection  problem  at  the  VLSI 
level,  a  breakthrough  in  chip  design  methodology  or  architecture 
is  needed. 

H .  UNCOMMONALITY 

Major  investments  have  been  made  by  several  corporations 
in  facilities  whose  purpose  is  diametrically  opposed  to  com¬ 
monality;  namely,  that  of  minimizing  the  cost  and  schedule  for 
designing  and  producing  small  quantities  of  special-purpose 
circuits . 

Conceivably,  advances  in  computer-aided  design  (CAD)  and 
production  technology  (software-driven,  direct-write.  E-beam 
lithography)  could  eventually  make  it  possible  to  design  and 
produce  new  VLSICs  in  a  matter  of  hours  at  small  incremental 
cost  (smaller,  say,  than  the  cost  of  developing  a  printed  circuit 
board  for  an  equivalent  MSI  assembly),  in  which  case  the  cost 
and  schedule  objections  to  custom  VLSI  would  disappear  and  the 


19 


out-year  problems  of  logistics  and  operational  support  may 
prove  to  be  no  worse  than  those  for  the  equivalent  board 
assemblies.  For  military  purposes,  however,  the  widespread 
use  of  custom  circuitry  implies  large  outlays  not  only  for 
capital  equipment  but  also  for  qualification,  documentation, 
logistics,  and  operational  support,  and  it  is  doubtful  that 
military  requirements--in  their  totality — could  support  even 
one  such  facility,  which  raises  the  question  of  who  would  own 
and  operate  it  and  how  could  it  be  made  available  to  the 
numerous  suppliers  of  military  equipment. 

A  choice,  by  decree,  between  the  pursuit  of  the  greatest 
commonality  or  low-cost  custom  technology  for  military  appli¬ 
cations  is  not  likely  to  be  forthcoming.  More  likely  than  not, 
these  two  developments  will  continue  their  diverse  ways  until 
the  weight  of  experience  rules  out  one  or  the  other.  For 
signal-processing  applications  in  which  multiple  layers  of 
pipelining  impose  no  throughput  penalty,  progress  would  seem 
to  be  on  the  side  of  commonality;  for  when  circuit  integration 
reaches  the  level  that  substantial  memory  can  reside  on  the 
same  chip  with  the  processor,  prefetching  and  other  complicated 
timing  operations  can  be  eliminated  without  incurring  speed 
penalties,  thus  reducing  the  relative  performance  advantage  of 
the  custom  circuit. 

The  author  is  of  the  opinion,  based  on  institutional  as 
much  as  financial  and  technical  considerations,  that  the 
pursuit  of  commonality  is  the  better  course  for  the  Department 
of  Defense. 

I.  SIMPLE,  FAST,  AND  HOT;  OR  BIG,  SLOW,  AND  COOL? 

Today  a  remarkable  variety  of  integrated  circuit  techniques 

occupy  their  own  competitive  niches:  the  VLSI  technologies  are 
2 

NMOS  and  I  L  which  operate--with  today’s  design  rules--  at  5  MHz 
to  10  MHz,  usually  dissipating  a  fraction  of  a  watt  per  chip, 
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while  the  fastest  circuits  (ECL  and  other  forms  of  current 
switching  logic)  with  only  a  few  hundred  gates  dissipate  ten 
times  that  power — and  board  designers  contemplate  methods  of 
cooling  which  might  eventually  permit  10  or  20  W  dissipation 
per  chip  for  higher  speeds.  CMOS  achieves  intermediate  circuit 
densities  and  speeds,  but  with  lower  relative  power  consumption. 

The  economic  survival  of  these  classes  of  circuits  side- 
by-side  seems  less  surprising  when  it  is  observed  that  their 
throughput  capacities  (with  comparable  design  rules)  are 
remarkably  similar.  The  lower  design  cost  of  the  simpler  high 
speed  circuit  is  offset  by  the  higher  production  cost  per 
function  and  the  board  and  chip  costs  involved  in  cooling. 

Will  future  advances  in  integrated  circuit  technology  upset 
this  balance  in  favor  of  one  or  the  other? 

The  answer  to  this  question  is  by  no  means  obvious.  The 
design  cost  per  function  seems  to  follow  the  salaries  of  the 
IC  design  teams  (Ref.  7)  independent  of  the  level  of  integration. 
Unless  this  trend  were  to  be  reversed  &  -the  development  of  new 
design  automation  tools,  the  VLSI  approach  would  continue  to  be 
economically  viable  for  the  IC  manufacturer  only  if  the  market 
could  absorb  the  proportionately  higher  design  costs.  It  is 
not  clear  what  form  the  100,000  gate  circuit  might  take  whose 
commonality  of  purpose  would  create  a  market  that  would  justify 
such  an  investment  (Refs.  7,  23). 

The  military  applications,  in  themselves,  are  an  unlikely 
source  of  demand  from  the  viewpoint  of  the  IC  manufacturer. 
(Military  systems  buyers  are  unable  to  justify  the  purchase  of 
custom  circuits  at  the  LSI  level,  and  the  cost,  schedule,  and 
risk  rise  at  least  in  proportion  to  the  level  of  circuit 
integration.)  The  economic  justification  for  VLSI  aside,  there 
remain  problems  related  to  speed  and,  hence,  throughput.  The 
potential  speed  of  circuitry  increases  with  reductions  in 
feature  size  (see  Section  III,  below),  but  the  interchip  delays 
In  themselves  limit  circuit  speed  to  50  MHz  or  so  unless  lower 
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voltage  levels  (<  5  V)  are  adopted  and  noise  margins  propor¬ 
tionately  lowered  or  transmission  line  techniques  are  employed.* 
The  final  practical  road  blocks  to  the  use  of  VLSI  are  pin  out 
and  interconnects.  The  requirements  for  ever  larger  numbers  of 
pins,  voltage  matching  stages,  line  drivers  at  higher  levels  of 
integration,  promise  to  recede  only  when  entire  systems  can  be 
embodied  on  a  single  chip.  As  for  on-chip  interconnections, 
empirical  evidence  indicates  that  multiple  levels  of  metal¬ 
lization  are  necessary  in  the  VLSI  region--possibly  four  levels 
at  100,000  gates--implying  a  serious  loss  of  yield  with  today's 
production  technology. 

The  computer  industry  examines  the  available  alternatives 
and  elects  to  use  \the  faster,  simpler,  current  switching  cir- 
cuits--accepts  their  higher  cost  per  function  and  the  onerous 
board  design  problems  in  the  bargain  (Ref.  24).  One  consider¬ 
ation  favoring  this  choice  is  the  adaptability  of  the  current 
switching  transistors  to  the  gate  array  (master  slice)  ap¬ 
proaches,  because  of  its  near  indifference  to  interconnect 
loading. 

Economics  and  technology  favor  the  higher  speed,  simpler 
circuits,  except  in  the  consumer  market--where  unit  cost 
becomes  a  dominant  consideration--and  in  military  environments 
where  the  limited  temperature  range  of  these  circuits  may  be 
unacceptably..  In  those  military  systems  where  weight  and 
volume  become  paramount,  the  big,  slow,  and  cool  VLSI  approach 
may  be  economically  preferable,  but  there  will  be  an  interval-- 
possibly  a  long  one--during  which  the  more  advanced  bipolar 
circuits  will  play  a  growing  role  in  high-performance  military 
applications,  particularly  with  the  availability  of  high-speed, 
special-hardware  macros.  The  duration  of  this  interval  will 


*The  use  of  networks  of  transistors  operating  in  series  to 
embody  logic  function  of  several  variables  brilliantly  avoids 
this  problem. 


depend,  in  part,  upon  the  obduracy  of  a  host  of  technical 
obstacles  which  stand  in  the  way  of  VLSI,  beginning  with  the 
silicon  ingot  (100  fi-cm  resistivities,  500  ysec  minority 
carrier  lifetime),  and  the  need  for  an  optically  flat  surface, 
defect  gettering  mechanisms,  etc.,  extending  into  lithography 
(nondestructive,  automatically  aligned),  fabrication  (dielectric 
and  resist  imperfections,  multilevel  interconnections,  dopant 
control,  metal-to-semiconductor  contacts  which  chemically  mis¬ 
behave  when  scaled  to  micron  dimensions),  etc.  (Ref.  18). 
Finally,  packing,  chip  design,  testing,  and  fault  (or  failure) 
tolerance  all  grow  in  difficulty  with  chip  complexity. 

J.  THE  MARGINAL  RETURN  ON  INVESTMENT 

The  justifiable  level  of  investment  in  integrated  circuit 
technology  depends  on  the  potential  for  cost  avoidance  or  the 
potential  value  of  performance  improvements  and  must  take 
account  of  resource  limitations  (such  as  competent  IC  design 
teams.  Ref.  20,  p.  85)  which  have  no  immediate  monetary 
equivalent . 

The  study  of  resource  allocation  limitations  does  not  fall 
within  the  scope  of  this  program  and  the  comparative  value  of 
the  performance  Improvements  (e.g. ,  extending  the  detection 
range  of  a  SONAR  array  in  an  attack  submarine)  must  be  decided 
at  the  highest  levels  of  DoD  management.  But  the  potential  for 
cost  avoidance  within  given  performance  demands  is  subject  to 
approximate  quantification  and  is  manifestly  finite  (limited 
to  the  total  marginal  cost  of  all  current  systems  which  is 
attributable  to  the  integrated  circuit  assemblies).  On  the 
other  hand,  there  is  no  finite  capital  investment  which  could 
totally  eliminate  these  costs.  Therefore,  the  marginal  return 
on  capital  expenditures  for  IC  technology  becomes  zero  at  some 
level  of  performance*  (once  a  system  has  been  reduced  to  a 

*This  is  shown  schematically  in  Fig.  1. 
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FIGURE  1.  Schematic  representation  of  return 
on  investment  in  IC  technology 

single  chip,  little  can  be  gained  from  further  reduction  in 
circuit  size).  An  analysis  of  the  total  lifetime  costs  of  the 
F-l8  attributable  to  its  integrated  circuit  assemblies  shows 
that  the  greatest  return  in  investment  (in  more  compact  circuits) 
occurs  at  the  level  of  300  to  50 0  gates  per  IC,  provided  the 
system's  functions  are  not  altered  (Ref.  4).  This  supports  the 
conclusion  reached  in  Ref.  3  concerning  military  systems 
generally.  The  economic  justifications  for  VLSI  must  be  sought 
in  more  advanced  systems  than  those  currently  deployed  (such  as 
the  Assault  Breaker  type  of  system),  in  elevated  performance 
goals,  or  in  circuits  with  a  broader  commonality  than  a  single 
military  system  (permitting  wider  distribution  of  IC  develop¬ 
ment  costs). 

Although  the  total  potential  for  cost  avoidance,  taking 
into  account  possible  commonality  over  many  systems,  might  be 
estimated  for  systems  now  In  engineering  development  or  produc¬ 
tion,  this  is  clearly  impossible  for  future  military  systems 


Rk 

■ 


developments  which  will  feel  the  full  economic  impact  of  the 
VHSI  program.  Nevertheless,  the  relationship  between  total 
circuit  demand  and  the  level  of  circuit  integration  is  clarified 
by  Pig.  2,  which  shows  the  (approximate)  total  number  of  gates 
(in  several  military  assemblies  and  systems)  and  their  average 
clock  speed.  On  the  same  figure,  the  estimated  NMOS  performance 
is  shown  for  a  range  of  design  rules.  From  this  it  can  be  seen 
at  what  design  rules  the  various  systems  could  be  integrated 
into  a  single  chip.  Of  the  data  processing  subassemblies  shown 
on  this  graph,  all  but  the  BQQ-5  could  be  reduced  to  a  single 
chip  using  about  2  pm  design  rules;  but  in  all  cases  the  systems 
must  be  redesigned  to  operate  at  much  higher  on-chip  clock 
speeds  in  order  to  achieve  the  computed  throughput. 

Unfortunately,  the  initial  investment  in  manufacturing 
technology  required  to  achieve  these  various  design  rules  is 
hardly  capable  of  even  crude  estimation  at  this  time;  never¬ 
theless,  an  investment  of  $100  million  to  $200  million  by  the 
DoD  in  IC  circuit  development  and  advanced  technology  seems 
well  justified  by  the  potential  for  cost  avoidance  and  perform¬ 
ance  improvements.  The  estimated  potential  for  cost  avoidance 
exceeds  100  to  1  over  the  next  10  to  20  years  (Ref.  3)*  The 
eventual  value  of  the  enhancement  in  military  systems  perform¬ 
ance  is  inestimable,  since  the  most  extreme  IC  performance 
requirements  (0.5  pm  to  1.5  pm  feature  sizes)  occur  in  appli¬ 
cations  whose  operational  capabilities  are  not  yet  fully 
understood . 
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lie  approximate  total  throughput 
for  several  military  systems  and 
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for  fixed  total  power  dissipation  (300  MW) 


III.  MILITARY  SYSTEM  REQUIREMENTS  FOR 
HIGH  THROUGHPUT  PROCESSORS 


In  this  section,  several  categories  of  military  signal- 
and  data-prccessing  equipments  are  discussed,  including  sonar, 
radar,  ELINT,  GPS,  JTIDS,  and  speech  abstraction.  In  some 
cases,  computational  rates  are  given  directly  in  terms  of  the 
system  performance  goals,  which  show  the  necessity  for  very 
high  throughput  processors  even  when  the  most  powerful  algo¬ 
rithms  are  used.  In  each  case,  the  usefulness  of  special 
circuitry  is  discussed. 

Since  the  fast  Fourier  transform  (FFT)  figures  promi¬ 
nently  In  several  of  these  applications,  formulas  for  memory 
access  rates  and  computational  rates  of  the  FFT  will  be 
reviewed.  In  the  radix  2  FFT,  N  complex  additions  and  i-N 
complex  multiplications  are  performed  at  each  of  the  log2N 
stages  of  the  FFT,  but  each  complex  multiplication  involves 
two  real  additions  and  four  real  multiplications,  although  in 
some  cases  the  multiplier  will  be  ±  j  or  ±  1.  (Multiplication 
by  ±  j  amounts  to  a  relabeling  of  the  multiplicand.)  Thus, 
something  less  than  3Nlog2N  real  additions  and  2Nlog2N  real 
multiplications  are  performed.  The  computational  rate  C  is 
proportional  to  Nlog2N  and  inversely  proportional  to  the  time 
period  available  for  completing  the  FFT. 

These  computational  rates  may  be  equated  to  throughput 

capacity  (T)  by  a  relation  between  multiplier  gate  count  (G) 

o 

and  word  size  (B),  such  as  G  w  25B‘~  (Ref.  25)  from  which 

T  ss  25B^C  g.  c  .  /sec. 
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The  gate  count  for  addition  is  much  smaller,  G  a  l8B  (Ref.  26). 

With  respect  to  the  number  of  transfers  of  data  from  memory 
to  working  registers,  if  the  processor  contains  2R  (R  complex 
numbers)  registers,  from  which  operands  -.can  be  drawn,  then  the 
first  log^R  stages  of  the  FPT  can  be  executed  without  inter¬ 
vening  transfers  between  memory  and  the  working  registers. 

The  simplest  procedure  is  to  reload  the  working  registers  from 
memory  and  repeat  the  process  N/R  times  before  the  stages 
beyond  the  log^R  stage  are  started.  For  the  remaining 
log2N  -  log2R  stages,  data  transfers  are  necessary  at  every 
stage  and  for  all  the  N/R  groups  of  data.  Thus,  the  total 
number  of  transfers  between  memory  and  the  working  registers  is: 


F  =  |  [1  +  1 o g 2  (|)  ]  . 


Prior  to  the  introduction  of  single-chip,  high-speed  multi¬ 
pliers,  multiplications  were  so  dominant  a  consideration  in 
signal  processor  applications  that  the  entire  system  was  often 
characterized  by  its  multiplication  rate,  but  this  is  no  longer 
the  case;  rather,  memory  access  time  is  often  a  limiting  factor-- 
at  least  in  the  execution  of  the  FFT.  When  circuit  integration 
reaches  the  level  that  the  cache  for  storing  the  entire  block 
of  data  can  reside  on  the  same  chip  with  the  processor  (Ref.  15), 
the  potential  for  an  improvement  in  power  and  reliability  exists 
with  the  elimination  of  pre-fetch  operations.  This  can  occur  in 
the  1!4  pm  region  with  30K  or  so  equivalent  gates  and  100K  bits 
of  memory  on  the  chip. 

The  following  are  some  of  the  more  concise  and  Interesting 
results  given  in  this  section  for  the  uses  of  the  FFT. 

For  sonar,  the  rate  of  multiplications  and  additions 
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where  W  is  the  total  bandwidth  and  ft  the  total  angular  sector 
being  monitored,  of  the  spectral  resolution,  6  the  angular 
resolution.  The  total  amount  of  data  involved  in  each  PFT  ■ 


when  B  is  the  word  sire. 

For  synthetic  aperture  radar 


C 


2ARV 


log2 


3 


and 


M  =  2B  ^  B  ; 

P 

where  V  is  aircraft  velocity,  R  the  standoff  range,  p  the 
resolution  (both  transverse  and  radial),  6  the  beamwidth  of  the 
physical  aperture  and  AR  the  swath  width  being  covered. 

For  spectral  analysis 

a  =  2W1oe2 


and 

w 

M  =  2  B 


where  W  is  total  bandwidth,  6f  the  frequency  resolution. 
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A.  SUBMARINE  SONAR  SYSTEMS 


Although  only  a  relatively  narrow  band  of  frequencies  is 
useful  in  submarine  sonar  systems  (owing  to  the  progressive 
attenuation  of  acoustic  waves  in  water  with  ascending  frequency), 
the  large  number  of  resolvable  beams  and  frequency  channels 
combine  to  create  a  considerable  signal  processing  requirement. 
The  use  of  large  arrays  to  form  an  approximately  equal  number 
of  proportionately  smaller  beams,  augments  the  detectability 
of  acoustic  sources  against  the  natural  ambiance  and,  further¬ 
more,  improves  their  detectability  against  surface  traffic-- 
disproportionately — when  the  angular  resolution  reaches  the 
point  that  individual  beams  find  clear  channels  between  surface 
ships.  Very  fine  spectral  resolution  6f  (0.05  Hz  or  smaller) 
facilitates  target  classification  (if  not  identification)  on 
the  basis  of  the  propeller  and  crankshaft  rotations,  speed  of 
auxiliary  motors,  and  so  on. 

The  powerful  FPT  algorithms  are  applicable  to  beam  forming 
and  spectral  analysis  in  tandem  [spectral  analysis  rather  than 
time  delay  and  addition  produces  beams  only  to  the  extent  that 
the  narrow  band  approximation  holds  (Ref.  27),  hence  the  neces¬ 
sity  of  tandem  FFTs--one  temporal  and  one  spatial--over  the 
elements  of  the  array]. 

Neglecting  certain  fine  points,  the  number  of  operations 
per  second  (real  multiplications  and  additions)  involved  in 
monitoring  a  total  bandwidth  W  with  spectral  resolution  6f  and 
an  array  of  beams  of  size  B  (steradians)  covering  a  sector  of 
total  size  n  is  given  by*: 

C  =  2  (10g2  |  +  10g2  -Jfe)  . 


*Ref.  28.  However,  the  computational  rate  is  expressed 
differently. 
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Suppose,  for  example,  W  =  1.5  kHz,  Sf  =  0.5  Hz,  Q  =  1  sr, 

6  =  0.002  sr,  B  =  1;  then  C  =  3.6  x  10^  and  M  =  3  x  10^  bits. 

The  PPT  processes  blocks  of  data  (in  these  examples, 

20,000  words)  and  requires  considerable  numbers  of  transfers 
between  the  processor  and  the  memory  section  where  the  data  is 
stored  (this  data  is  transformed  as  the  FPT  unfolds).  An 
embodiment,  at  the  MSI  level,  of  a  programmable  processor  with 
*  this  capability  would  exceed  the  acceptable  limitations  on 

,  space  (if  not  power  and  reliability).  Even  with  the  best 

’  components  commercially  available  today,  a  processor  with  only 

about  one  fifth  this  capability  might  reasonably  be  mounted  in 
^  a  submarine,  with  the  operational  consequence  of  lower  search 

rate  (sequentially  forming  the  beams),  lower  spectral  resolution, 
or  both. 

Several  advanced  experimental  passive  acoustic  systems  go 
well  beyond  the  submarine  systems  in  performance,  such  as 
1/2C  Hz  resolution  over  a  2-kHz  band  and  1500  beams. 

Other  signal  processing  operations  used  in  submarine 
acoustic  array  processing  include  passive  ranging  and  signature 
recognition.  For  the  latter  function,  large  files  of  target 
signatures  are  maintained. 

Of  the  suggested  hardware  macros,  the  FFT  butterfly 
associative  memory  elements,  and  programmable  function  calculator 
appear  to  be  particularly  applicable  to  passive  acoustic  array 
processing. 

B.  SYNTHETIC  APERTURE  RADAR  (SAR) 

1 .  Tactical  Uses  of  SAR 

Several  types  of  sensors  are  being  investigated  for  use  in 
autonomous,  stand-off  airborne  tactical  aircraft  systems  for 
attacking  vehicles  and  other  mobile  ground  equipment,  but  the 
synthetic  aperture  radar  having  a  resolution  of  a  foot  or  so  is 
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at  the  most  advanced  stage  of  development.  However,  its 
ability  to  detect  and  classify  military  vehicles  in  all  environ¬ 
ments  is  not  yet  fully  established. 

2 .  Spectral  Analysis  of  Radar  Returns 

In  synthetic  aperture  radar  processing,  spectral  analysis 
is  used  to  segregate  ground  returns  on  the  basis  of  their 
doppler  shift;  all  returns  which  fall  within  a  doppler  band 
represent  scatterers  lying  within  two  hyperbolic  sectors  along 
the  ground.  The  finer  the  spectral  resolution,  the  closer  the 
corresponding  hyperbolas  (actually  intersections  of  hyperbaloids 
with  the  earth's  surface). 

Our  purpose  here  is  to  relate  the  SAR  data-processing  rate 
(i.e.,  number  of  arithmetic  operations  per  second)  to  perform¬ 
ance  (resolution  and  swath  width). 

The  synthetic  aperture  may  be  formed  by  first  removing  the 
frequency  deviation  of  each  sequence  of  returns  from  a  given 
ground  sector  and  then  performing  spectral  analysis*.  The 
frequency  deviation  rate  corresponds  to  the  changing  range  of 
a  fixed  target 


=  Vr2  +  v-t 


2.2 


(t  =  0,  being  the  time  of  closest  approach) 


R  ~  r  +  £ 

o  d  R 


*This  is  not  to  be  confused  with  frequency  deviation  used  in 
linear  FM  pulses,  as  in  the  polar  transformation  techniques 
used  for  spotlight  SAR.  With  the  latter  methods,  the  FM 
deviation  of  the  local  oscillator  corresponds  to  the  pulse 
modulation  and  is  repeated  with  each  sweep. 
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the  corresponding  echo  delay 


T 


2  2 
v'V 

cR 


and  doppler  frequency  deviation  rate 


(t  is  the  pulse  echo  delay  at  closest  approach,  c  the  velocity 
of  light). 

This  frequency  deviation  is  removed  by  a  frequency  modu¬ 
lated  local  oscillator,  leaving  fixed  doppler  shifts  of  target 
echoes  from  across  the  beam  (of  the  physical  aperture).  The 
frequency  sweep  extends  over  as  many  pulse  returns  as  are  used 
to  form  the  synthetic  aperture.  The  frequency  sweep  is  then 
repeated  for  the  next  sector. 

In  terms  of  the  beam  width  3  of  the  physical  aperture,  the 
maximum  interval  between  samples 


X 

Tm  2V3  ' 

If  the  pulse  Intervals  are  longer  than  t  ,  there  would  be 
doppler  ambiguities  within  the  main  lobe.  Actually,  most  radars 
operate  at  shorter  pulse  Intervals  (they  oversample),  and  the 
redundant  pulses  are  combined  by  Interpolation  before  being 
applied  to  the  SAR  processor.  The  suggested  interpolator  macro 
would  be  useful  for  this  purpose. 

The  total  interval  T  needed  to  form  a  synthetic  aperture 
of  transverse  resolution  p  and  range  R, 


(the  length  of  the  aperture  L  =  VT,  whence  the  angular  reso¬ 
lution  <J>  =  and  p  =  <j>R),  and  the  minimum  number  of  returns  (M) 
needed  to  form  the  aperture  is  given  by 

M  =  —  =  —  . 


Spectral  analysis  of  the  M  returns  from  each  range  cell 
produces  M  spectral  components--har*monics  of  the  fundamental 
l/T--each  being  the  return  from  a  segment  of  transverse 
dimension  z.  But  Mp  =  BR  (i.e.,  the  entire  arc  across  the 
antenna)  so  that  the  M  spectral  terms  which  can  be  produced 
from  the  M  (prefiltered)  samples  by  an  PFT  represent  one 
resolvable  arc  over  the  entire  beam. 

In  striving  for  finer  resolution  (smaller  p),  the  proc¬ 
essing  interval  T  will  eventually  reach  the  point  that  the 
range  of  the  target  relative  to  the  radar  will  change  by  more 
than  the  range  resolution  6R  and  beyond  this  point  the  target 
returns  are  not  properly  combined,  the  calculated  resolution 
not  achieved.  The  maximum  number  of  independent  sweeps  which 
can  be  combined  without  the  deleterious  effects  of  "range  walk" 
is  ^.’ven  by  the  condition 


VM1 t  f  <  4S 
1  m  2  a 


(since  VM. t  is  the  distance  over  which  the  radar  has  traveled, 
ft  -L  m 

VM. x  §■  is  the  relative  range  increment  for  n  target  at  the  edge 

1  m  2  y 

of  the  beam--where  the  effect  is  greatest),  but  Vt^B  =  j  >  hence 


An  M1  point  FPT  forms  sub-apertures,  each  focused  on  a 
different  segment  of  the  physical  aperture.  To  achieve  higher 
resolution,  the  process  is  repeated  times  and  all  of  the 
results  stored  for  "range  walk"  (or  "wavefront  curvature") 
compensation  prior  to  final  spectral  analysis,  in  which  the 
M^M2  returns  are  processed,  creating  a  synthetic  aperture  of 
length  M^M2Vxm  transverse  (cross  range)  resolution 


P 


M.IVLVt 
12  m 


RB 

M1M2 


in  other  words 


M 


2 


_1_  RB 
Mx  p 


The  total  processing  then  consists  of  an  point  spectral 
analysis  performed  M2  times,  with  the  results  collected  and 
stored.  Then  each  group  of  M2  returns  which  correspond  to  a 
given  ground  sector  (as  it  passes  through  M2  successive  sector 
positions)  are  range  walk  compensated--merely  by  selecting  the 
proper  group  from  memory  (somewhat  analogous  to  the  corner¬ 
turning  concept  but  less  extreme).  Spectral  analysis  of  each 
of  groups  of  M2  data  completes  the  process.  The  number  of 
arithmetic  steps  in  forming  the  subarrays  M2  times  is 
proportional  to  M2M^log2M^  and  to  M^M2log2M2  for  processing  the 
groups  of  M2  points,  for  a  total  of  (M^M,,)  log2  (M^M2 ) ;  the 
total  complexity  is  that  of  an  M^M2  point  transform.  To  process 

a  range  swath  AR  with  range  resolution  <5R,  these  calculations 
AR 

are  performed  times;  generally  p  is  made  equal  to  6R. 

The  returns  at  range  R  from  over  the  entire  beam  are  proc¬ 
essed  by  these  arithmetic  operations.  If  during  this  period 
the  radar  had  moved  through  less  than  one  beam  width  (at  the 
range  R)  then  an  interval  would  remain,  before  the  next  batch 
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of  computations  would  have  to  commence,  which  could  be  used  to 
complete  the  computations.  On  the  other  hand,  if  the  radar  had 
moved  more  than  Rg,  the  computations  would  have  to  be  completed 
faster  to  avoid  overlap.  The  factor  by  which  the  computation 

Since 


VT  X 

rate  must  be  adjusted  from  this  effect  is  gr  =  ~ — % 

rip  ipp 


the  interval  which  elapses  in  the  gathering  of  a  batch  of  data 
XR 

is  M1M2Tjn  =  ,  the  net  average  computational  rate  addition 

and  multiplication 


5=4  — 

1  P 


F6  '  f  l06  <M1M2>  '  2HT  <f> 


m 


Similarly , 


M  = 


An  ingenious  technique — the  polar  format — has  been  devel¬ 
oped  for  wave-front-curvature  compensation  (Ref.  4).  An  optical 
technique  in  its  original  form,  it  has  been  rather  literally 
transcribed  into  electronic  processing.  The  polar  algorithm 
makes  essential  use  of  linear  FM  pulse  modulation  which 
partially  obscures  a  direct  comparison  with.^the  method  already 
treated.  The  received  echoes  are  mixed  againdt^a  local 
oscillator  whose  frequency  deviation  rate  matches  the  FM  of 
the  pulse  so  that  an  isolated  pulse  ech<5  appears  as  a  CW  burst 
whose  frequency  is  proportional  to  target  range  (pulse  echo 
delay).  The  range  resolution  depends  on  the  pulse  duration. 

■“-V 

This  concept  (one  that  goes  back  at  least  to  the  early 
1950 ’s — known  then  as  ORDIR)  has  merit  in  its  own  right  for 
providing  disproportionately  high  range  resolution  in  relation 
to  the  processing  bandwidth,  but  also  turns  out  to  be  uniquely 
suited  to  optical  processing.  In  fact,  if  the  successive 
sweeps  are  photographed  on  a  circular  plate  while  it  is  rotated 
at  the  proper  rate,  the  range-c.urvature-corrected  imaging  is 


1 


subsequently  accomplished  by  a  single  spherical  lens!  In  its 
optical  form,  the  polar  format  algorithm  could  hardly  be 
simpler,  but  this  is  not  true  of  its  electronic  equivalent. 

Following  the  mixing  of  the  return  against  the  linearly 

deviated  oscillator,  a  two-dimensional  (N,  x  N_ )  FFT  is  per- 

AR  8R  ^ 

formed--N^  =  ^  in  range  and  =  —  in  azimuth--involving  a 

number  of  computations  proportional  to  N^I^logCN^N^ )  and 

(following  the  same  development  used  previously) 


which  is  greater  than  the  previous  result  by  the  factor 


log(f) 


5 


which  would  normally  about  equal  2.  However,  the  first  FFT,  in 
range  of  the  polar  format  process,  might  be  "written  off" 
against  pulse  compression.  The  practical  differences  between 
these  two  methods  of  SAR  processing  reside  chiefly  in  the 
procedures  for  storing  and  retrieving  data  from  memory. 
Numerically,  if 

p  =  0 . 5  m 
6R  =  Z 
AR  =  10^  m 
6  =  3  x  10“2 
V  =  100  m/sec 
R  =  105  m 
C,  =  10^/sec 

-L 
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then 


£2  =  1.6  x  10  /sec 


These  computational  rates  somewhat  exceed  those  for  the  passive 
acoustic  frray  processor. 

The  corresponding  throughput  rates  for  12-bit  processor. 


while  the 


T1  =  7.2  x  10  g. c . /sec 


T2  =  l.ijil  x  10  g.c./sec 


M  -  l.AH  x  10y  bit  l 


3.  Polar  to  Rectangular  Transformation 

The  spectral  analysis  of  radar  ground  echoes  segregate 
returns  from  radial  sectors  (strictly  speaking,  hyperbolic), 
with  their  origin  at  the  projected  radar  position.  In  strip 
mapping,  the  data  must  be  reformatted  into  rectangular  coordi¬ 
nates,  a  process  which  involves  Interpolation.  In  a  general- 
purpose  computer  the  memory  transfers  and  arithmetic  operations 
involved  in  this  process  often  consume  more  time  than  the  PFTs 
themselves.  Special  computer  architectures  for  image  processing 
have  been  proposed  which  virtually  eliminate  the  memory  trans¬ 
fers  by  the  use  of  shift  register  configurations  which  present 
the  needed  data  to  the  ALU  in  every  cycle.  These  devices  are 
characterized  by  specially  organized  data  storage  (the  serpen¬ 
tine  shift  register  string  with  multiple  output  points)  (Refs. 
17,  29,  30,  31). 
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C.  ELINT  PROCESSORS 


ELINT  receivers  identify  emitters  by  their  transmission 
spectrum  (bandwidth  and  center  frequency),  type  of  modulation, 
pulse  length  and  pulse  frequency  (of  radars),  etc. 

The  observation  of  spectral  characteristics  requires 
spectral  analysis  and  is  most  efficiently  performed  by  the  FPT 
algorithm.  The  total  bandwidth  W  analyzed  by  an  N  point  FFT 
with  spectral  resolution  6f  =  (T  being  the  total  sample 
period)  is  simply 


If  the  bandwidth  W  is  to  be  monitored  continuously,  the  N  point 
FFT  must  be  completed  in  the  interval  T,  implying  a  computa¬ 
tional  rate 

c  -  uw  log2  <$)  . 

Thus,  to  monitor  a  total  band  of  20  MHz  with  a  spectral  resolu- 

O 

tion  of  1  kHz  demands  C  «  2.8  x  10°  operations  per  second, 
which  is  greater  than  the  previously  calculated  rate  for  either 
passive  sonar  or  SAR.  Clearly,  wideband  spectral  analysis 
requires  extreme  processing  rates,  and  the  complexity  of  this 
portion  of  ELINT  equipment  severely  constrains  the  current 
system  performance.  As  faster  circuitry  becomes  available,  the 
system's  capabilities  will  tend  to  expand.  Spectral  analysis 
is  appropriate  for  measuring  the  characteristics  of  complex 
(large  TW  product)  radar  pulse  and  communications  waveforms, 
but--somewhat  paradoxically--is  awkward  for  observing  the 
durations  and  center  band  frequency  of  simple  (rectangular) 
pulses,  for  which  purpose  sequency  filtering  (Ref.  32)--the 
discrete  Walsh-Hadamard  transform--is  more  suitable.  The 
computation  of  Walsh-Hadamard  transforms  involves  the  operation 
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E(A.  ®  B .  ) ,  i.e.,  taking  the  numerical  sum  of  XOR  terms.  It  is 
i 

one  of  the  recommendations  of  this  study  that  this  operation  be 
included  as  a  machine  instruction  in  any  general-purpose 
processor  developed  under  government  contract. 

The  enormous  flux  of  emanations  to  which  an  airborne  ELINT 
receiver  is  exposed  would  saturate  signal  processing  capacity 
(or  force  the  operator  to  curtail  the  bandwidth  being  covered) 
unless  a  high-speed  associative  processor  (for  recognizing 
strings  of  pulses  belonging  to  a  common  emitter)  were  available. 
The  associative  element  includes  a  content-addressable  memory 
which  is  addressed  by  the  identifier  word  and  generates  a  bit 
sequence  indicating  whether  the  identifier  matches  one  or  more 
words  already  in  memory.  In  more  flexible  associative  memory 
circuits,  any  subset  of  the  identifier  word  can  be  masked  and 
the  circuit  responds  by  giving  all  of  the  different  words 
already  in  memory  which  match  the  unmasked  portion  of  the 
identifier.  The  number  of  parameters  used  to  characterize 
modulation  and  the  precision  with  which  they  are  measured 
determines  the  size  of  the  input  "identifier"  word.  For 
example,  center  band  frequency,  pulse  width,  interpulse  period, 
and  possibly  angular  direction  data  characterize  a  simple  radar 
transmission,  and  if  these  were  specified  with  P-bit  precision, 
the  identifier  word  would  consist  of  4P  or  5P  bits  (depending 
on  whether  one  or  both  angular  coordinates  of  the  emitter  are 
measured).  The  total  memory  size  must  equal  the  total  number 
of  emitters. 

An  aircraft  exposed  to  the  emanations  of  several  hundred 

radars  each  with  average  pulse  repetition  frequencies  of  300 

5 

per  second  (say)  would  receive  on  the  order  of  10  ^iilses  per 
second,  an  average  of  one  every  10  usee.  The  actual  spacing 
between  pulses  will  be  effectively  random  and  in  some  cases 
will  be  much  shorter  than  the  average,  but  the  speed  require¬ 
ment  for  the  content  -addressable  memory  can  be  alleviated  con¬ 
siderably  by  introducing  a  buffer  which,  during  short  intervals 
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of  dense  signal  arrivals,  stacks  identification  words  until  the 
associative  memory  element  can  process  them.  By  comparison,  a 
commercially  available  bipolar  content-addressable  memory  cir¬ 
cuit  (the  INTEL  3104)  introduces  a  maximum  delay  of  only  30  nsec, 
although  the  circuit  stores  only  16  bits  (4  x  4).  The  ELINT 
applications  call  for  more  dense  circuitry  rather  than  higher 
speed  and  thus  appear  suitable  for  NMOS . 

D.  COMMUNICATIONS 

PN  coding  sequences  are  widely  used  in  communications 
(such  as  JTIDS)  and  navigation  (GPS)  systems,  for  protection 
from  spoofer  repeaters,  and  to  obtain  greater  processing  gain 
for  lower  jammer  vulnerability  or  lower  probability  of  intercept. 
In  its  simplest  form,  a  sequence  generator  produces  a  code 
sequence  C.  synchronously  with  the  message  bit  sequence  B.^  and 
the  product  sequence  T.^  =  C1  ®  Bi  is  transmitted*.  Code 
stripping  (removal  of  the  code  sequences  and  recovery  of  the 
message  bits)  is  effected  at  the  receiver  by  the  synchronous 
application  of  the  same  coding  sequence,  ®  T^  =  ©  Bi  = 

B^ .  In  some  cases  very  long  (several  days  at  10  Mb/s)  coding 
sequences  are  produced  and  new  generating  parameters  are  applied 
for  each  complete  sequence  (Ref.  33).  The  code  generator  may 
consist  of  several  tens  of  thousands  of  equivalent  logic  gates 
of  circuitry  (largely  in  the  form  of  registers)  (Ref.  16). 

In  practice,  PN  sequences  are  also  used  to  expand  the 
transmitted  data  rate  relative  to  the  message  bit  rate  for 
processing  gain  against  natural  or  malicious  noise  sources  by 
running  the  code  sequence  at  a  higher  rate,  so  that  each  message 


g 

®  signifies  the  exclusive  OR  characterized  by  1®0  =  0®1  =  1, 
1®1  =  0®0  =  0,  hence  C  ®  C  ®  B  =  (C®C)®-B=0®B  =  B. 
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bit  is  applied  to  a  long  sequence  of  code  bits;  the  processing 

gain  being  just  the  number  M  of  code  bits  applied  to  each 

message  bit.  The  noise  power  may  then  force  numerous  bit 

reversals  while  the  majority  of  the  output  bits,  for  any  given 

message  bit,  remains  correct.  This  is,  in  fact,  the  proper  use 

of  the  redundant  bits--computing  the  plurality  (excess  of  Is 

over  Os  or  vice  versa);  viz.,  taking  the  numerical  sum 

[£(C.®B)]  for  each  message  bit  B.  This  operation  could  well  be 
i 

provided  for  in  a  general-purpose  processor  with  useful  data 
rates.  For  example,  a  16  bit,  5  MHz  processor  with  this  instruc¬ 
tion  could  keep  pace  with  an  80  Mb/s  message  stream.  This 
operation  was  discussed  earlier  in  connection  with  the  Walsh 
transform. 

Greater  processing  gain  for  a  given  message  source  can  be 
obtained  by  increasing  the  code  sequence  rate  and  hence  the 
signal  bandwidth,  but  this  implies  proportionately  more  precise 
synchronization  between  the  receiver  and  transmitter  sequence 
generators,  which  becomes  infeasible  in  practice.  In  mobile 
equipment  where  the  propagation  delay  is  uncertain,  open  loop 
clock  synchronization  is  not  even  a  theoretical  possibility. 

For  these  reasons,  receivers  are  designed  either  with  matched 
filter  embodiments  of  the  code  stripper--which  function  asyn¬ 
chronously — or  with  some  means  of  automatic  acquisition. 

The  matched  filter  combines  the  code  stripping  and  the 
summation  operations  and,  in  effect,  tries  all  positions  of  the 
decoding  sequences  relative  to  the  transmitted  sequences  and 
produces  its  maximum  output  at  registration.  The  matched  filter 
embodiment  might  be  In  the  form  of  a  surface  acoustic  wave 
device  (SAW),  a  CCD  circuit,  or  an  LSI  (or  VLSI)  digital  circuit 
(Ref.  3^ ).  Generally,  the  matched  filter  embodiment  would  be 
considerably  more  complex  than  the  synchronous  (correlation) 
decoder  were  it  not  for  the  synchronization  problem,  since  the 
relative  complexity  of  the  correlation  decoder  depends  on  the 
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uncertainty  in  propagation  time  delay  or  clock  synchronization 
(between  transmitter  and  receiver).  If,  for  example,  the  code 
bit  rate  (B)  were  10  MHz  and  the  time  delay  uncertainty  (D) 
were  10  ysec  (a  two-mile  position  uncertainty),  then  the  cor¬ 
relation  between  the  reference  clock  and  the  received  signal 
would  have  to  be  computed  for  100  different  relative  phases — 
either  simultaneously  or  sequentially.  If  M  were  DB  =  100  or 
less,  then  the  correlation  decoder  embodiment  degenerates  to 
the  matched  filter,  but  if  M  greatly  exceeded  DB,  the  correlator 
would  be  potentially  simpler. 

Code  application  requires  more  complicated  processing  for 
those  applications  (such  as  GPS)  in  which  the  transmitter  and 
receiver  are  in  relative  motion,  where  the  resulting  doppler 
shift  of  the  carrier  frequency  causes  loss  of  coherence. 

Special  circuits  for  automatic  frequency  and  phase  acquisition 
accomplish  the  doppler  "wipe-off",  but  this  can  only  be  done  as 
an  Integral  part  of  code  sequence  synchronization.  The  same  is 
true  when  frequency  and  time  hopping  are  employed  (to  further 
increase  processing  gain).  In  general,  each  degree  of  freedom 
in  the  signal,  whether  intentional  or  extraneous,  increases  the 
complexity  of  the  acquisition  circuitry  multiplicatively . 

Many  LSI  circuits  have  already  been  developed  commercially 
for  communicat i ons ,  such  as  companding  a/d  and  d/a  encoders  and 
decoders  for  pulse  code  modulation  (PCM)  lines,  2  million  of 
which  are  going  into  service  annually;  while  microprocessors 
are  being  used  in  packet-switching  gear,  multiplexer-concen¬ 
trators,  digital  voice  terminals,  and  so  on.  The  Army  Is  in 
the  process  of  procuring  an  intelligent  communications  terminal, 
the  AN/UGC-74,  which  performs  message  composition  and  editing, 
header  prompting,  printer  control,  message  buffering,  etc., 
under  the  control  of  a  microprocessor  (the  INTEL  8080). 
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E.  VOICE  ABSTRACTION 


Bandwidth — data  rate--compression  of  speech  is  an  important 
process  in  modern  and  foreseeable  future  military  communication 
systems.  The  universal  characteristics  of  speech  and  the  well- 
defined  characteristics  of  digital  communication  systems  suggest 
that  speech  abstraction  and  the  inverse  process,  speech  syn¬ 
thesis,  are  appropriate  applications  for  large-scale  integrated 
circuits. 

If  standard  speech  analyzers  and  synthesizers  for  military 
communications  could  be  defined,  the  volume  of  production  would 
almost  certainly  be  large  enough  to  justify  the  design  of  custom 
LSIC  chips  for  this  application.  Unfortunately,  standardized 
speech  processing  equipment  is  not  yet  available;  instead,  most 
such  devices  are  programmable  to  accommodate  a  variety  of 
processing  algorithms  and  formats.  This  programmability 
reflects  the  fact  that  quality  of  speech  reproduced  from  the 
widely  used  narrowest  band  format,  2.4  kb/s,  causes  annoying 
distortions  with  loss  of  speaker  recognition  and  there  are 
continuous  efforts  to  improve  it,  therefore  speech  processing 
equipment  is  usually  designed  to  be  flexible  enough  to  accom¬ 
modate  improved  algorithms  as  they  become  available.* 

Another  obstacle  to  standardization  of  speech-processing 
equipment  is  the  variety  of  communication  bandwidths  provided 
by  military  systems.  Tactical  communication  systems  often 
require  narrowband  signals.  2400  b/s  is  commonly  used  through 
such  links,  including  HF  radio  links,  VHF,  and  manpack  equipment. 
On  the  other  hand,  high  quality  wire  lines,  cable,  and  satellite 
links  are  available  for  important  command  and  control  components 
of  military  operations.  Often  a  message  must  be  routed  through 
a  tandem  combination  of  several  links  of  different  bandwidths. 


*The  Rate  Distortion  Coder  being  a  case  in  point  (Ref.  35). 


The  quality  of  speech  delivered  at  the  end  of  such  a 
system  can  be  no  better  than  that  provided  by  the  narrowest 
bandwidth  link  in  the  system.  The  quality  suffers  further  if, 
as  is  usually  the  case,  wideband  signals  must  be  demodulated 
and  decrypted,  then  compressed  in  bandwidth  and  remodulated 
and  reencrypted  to  pass  through  the  narrow  bandwidth  segment. 

A  direct  attack  on  the  standardization  problem  has  been 
undertaken  by  the  Naval  Research  Laboratory.  They  have  de¬ 
veloped  a  demonstration  model  of  a  Multiple  Rate  Processor  (MRP) 
which  can  process  a  speech  signal  into  either  a  narrowband 
2.4  kb/s  digital  signal,  or  into  a  9.6  kb/s,  high-speech-quality 
digital  signal  which  includes  the  digitized  2.4  kb/s  signal  as 
a  subset.  When  the  wideband  signal  is  transmitted  through  a 
tandem  set  of  links,  if  a  narrow  bandwidth  link  is  encountered, 
it  is  possible  to  strip  off  the  bits  required  for  the  wideband 
signal  without  demodulating  the  speech,  and  to  transmit  it  into 
the  narrowband  link  at  2.4  kb/s.  Upon  returning  to  a  wider 
band  link,  it  is  possible  to  change  format  back  to  9.6  kb/s 
(although  some  speech  quality  has  been  necessarily  lost  in  the 
narrowband  link)  again  without  demodulation. 

Preliminary  study  by  NRL  leads  them  to  the  conclusion  that 
the  MRP  could  be  implemented  with  ten  LSI  chips  which  would 
include  five  custom  chips  to  produce  the  2.4  kb/s  narrowband 
signal.  The  remaining  chips  would  be  used  to  process  the  ex¬ 
citation  signal  for  the  9-6  kb/s  format  involving  a  128-point 
FFT.  The  total  power  consumption  is  estimated  to  be  less  than 
6  W. 

When  development  now  in  progress  is  completed,  the  MRP 
could  be  used  in  a  very  large  fraction  of  speech  processing 
applications.  The  production  volume  would  be  large  enough  to 
justify  economically  the  design  of  custom  LSI  chips  for  that 
equipment . 


A  number  of  other  speech  processors  exist  or  are  under 
development.  The  software-dependent  processors  tend  to  require 
high-speed  circuitry  to  compensate  for  the  time  delay  in  soft¬ 
ware-controlled  manipulation  (for  example,  fetching  and  re¬ 
placing  numbers  from  and  to  memory).  For  example,  a  speech 
synthesizer  developed  by  Lincoln  Laboratory  uses  Emitter-coupled 
Logic  (ECL)  gate  arrays  and  consumes  60  W.  It  operates  at 
4.8  kb/s,  and  contains  eleven  ECL  gate  arrays  and  57  other 
chips,  or  "logical  packages".  As  another  example,  Ketron 
developed  a  software-controlled  processor  utilizing  150  stand¬ 
ard  chips  on  three  circuit  boards.  Their  preferred  form 
utilizes  TTL  technology  and  consumes  around  40  W.  A  CMOS 
version  was  constructed  which  operated  on  12  W.  Both  versions 
require  approximately  16,000  gates,  more  or  less,  exclusive  of 
8  memory  chips,  4  x  16  kb  capacity. 

Instead  of  using  a  programmable  signal  processor,  it  is 
possible  to  design  a  hard-wired  "flow-through"  processor  in 
which  a  signal  passes  from  section  to  section  as  it  is  processed. 
Since  several  successive  computations  are  performed  simulta¬ 
neously  in  the  several  sections,  the  need  for  high-speed 
circuitry  is  relieved,  permitting  use  of  MOS  LSI.  The  NRL  MRP 
is  of  this  type  with  no  programming  flexibility,  but  well  adapted 
to  iterative  computation  using  LSIC. 

Since  standardization  of  speech  processors  is  highly 
probable,  and  since  the  number  of  such  devices  needed  by  the 
DoD  will  be  very  great,  speech  analyzers  and  synthesizers  for 
bandwidth  compression  and  digitization  are  prime  candidates  for 
custom  LSIC  design  and  procurement.  The  benefits  would  include 
compactness,  reliability,  and  low  power  consumption. 

The  use  of  VLSI  technology  would  carry  further  the  same 
benefits.  Using  the  "pipeline"  or  flow-through  processing 
architecture  of  the  NRL  MRP,  VLSI  would  be  more  appropriate  than 
VHSI,  since  no  requirement  for  unusually  high  speed  exists  with 
this  type  of  signal  processing. 


F.  GLOBAL  POSITIONING  SYSTEM 


The  GPS  (Global  Positioning  System)  is  a  navigation  system 
in  which  the  user  determines  his  position  by  processing  passive 
range  measurements  to  each  of  four  separate  satellites.  The 
satellites  transmit  data  on  their  own  orbit  elements  and  clock 
error,  as  well  as  transmit  a  signal  which  embodies  the  clock 
time  signal.  The  user  observes  this  time  signal  by  synchro¬ 
nizing  his  pseudo-random  local  code  generator  to  the  code 
transmitted  by  the  satellite,  then  observing  the  time  difference 
from  his  own  clock.  By  making  such  observations  on  any  set  of 
four  of  the  24  satellites  planned  for  the  system,  each  of  which 
transmits  a  distinctive  code,  the  user  can  compute  both  his 
position  and  local  clock  error.  Accuracy  of  the  order  of  10  to 
50  m  is  achievable  anywhere  on  the  earth. 

The  availability  of  this  system,  now  under  development  and 
planned  to  be  operational  in  the  mid-eighties,  is  eagerly 
awaited  by  many  potential  users,  military  and  civilian. 

The  user  equipment  consists  of  two  main  parts  involving 
quite  different  types  of  components.  The  first  part  is  a 
sensitive  radio  receiver  with  a  small,  hemisphere-coverage 
antenna  adapted  to  receive  the  L-band  signals  radiated  from 
the  satellites.  There  are  two  carrier  frequencies,  = 

1575 .^2  MHz  and  =  1227.60  MHz.  Each  carrier  is  psk-modulated 
by  two  signals  in  phase  quadrature,  the  P-code  and  the  C/A  code. 
The  bit-rate  of  the  pseudo-random  P-code  is  10.23  Mb/s  and  its 
repetition  period  of  7  days  makes  it  nearly  impossible  to 
acquire  synchronization  without  the  assistance  of  the  lower  bit 
rate  C/A  code  which  repeats  every  millisecond  and  has  a  bit-rate 
of  1.023  Mb/s,  one  tenth  of  the  P-code.  Once  synchronized  with 
the  C/A  code,  a  0.50  b/s  data  stream  is  received  which  contains, 
among  other  things,  information  which  facilitates  handover  to 
the  P-code,  which  because  of  its  higher  bit  rate  permits  greater 
accuracy  of  range  measurement. 
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The  receiver  section  proper,  of  the  user  equipment,  ends 
with  the  demodulation  of  the  received  satellite  signal  and 
lock-on  of  the  local  code  sequence  generator  to  the  satellite 
sequence  generator. 

The  second  section  of  the  user  equipment  for  the  GPS  is 
the  signal  processor.  The  task  for  this  processor  is  remark¬ 
ably  complex: 

1.  It  must  read  from  each  satellite  contacted  its 
Keplerian  orbit  elements  and  clock  correction. 

2.  From  the  orbit  elements,  it  must  calculate  the 
true  satellite  position  vs  time  with  accuracy 
of  the  order  of  1  ft. 

3.  It  must  measure  time  difference  between  satel¬ 
lite  received  code  sequence  and  the  local  user 
clock  with  accuracy  of  less  than  10  n  sec,  it 
must  perform  this  for  four  satellites.  These 
time  differences  multiplied  by  the  speed  of 
light  determine  the  "pseudo  ranges"  to  the 
satellites  in  a  common  time  frame. 

4.  From  the  pseudo-ranges  of  four  selected  satel¬ 
lites  it  must  calculate  user  location  and  clock 
error. 

5.  Each  satellite  broadcasts  as  a  part  of  its  50  b/s 
signal  an  almanac  which  gives  the  approximate 
locations  of  the  other  23  satellites.  Prior  to 
Step  1  above,  the  user  system  must  select  a 
suitable  set  of  four  satellites,  widely  separated 
in  angle  but  not  too  near  the  horizon.  It  is 
this  selected  set  (identified  by  their  distinc¬ 
tive  code  sequences)  upon  which  the  ensuing  four 
steps  must  be  performed. 
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The  foregoing  processing  steps  involve  mainly  low-frequency 
digital  operations  and  are  obviously  suitable  for  LSIC  technology. 
Extreme  speed  is  not  necessary;  a  sequence  of  fixes  at  intervals 
of  several  minutes  is  often  all  that  is  required  for  navigation 
applications . 

Present  plans  for  user  equipment  envision  a  signal  proc¬ 
essor  based  on  one  or  more  microprocessors  with  the  usual  accom¬ 
panying  PROM  and  RAM  chips.  In  spite  of  the  slow  speed  allowed, 
a  large  number  of  microcircuits  (chips)  are  embodied  in  even 
the  simplest  version  of  user  equipment.  Examination  of  one 
specific  design  of  such  a  processor  revealed  that  there  were  a 
total  of  453  digital  microcircuits  in  the  signal  processor,  of 
which  125  were  LSIC  (including  112  4K  x  1  random  access  memory 
chips ) . 

It  is  tempting  to  speculate  on  the  improvement  in  reliabi¬ 
lity  and  compactness  of  this  equipment  which  would  result  if  as 
much  as  possible  of  the  circuitry  were  embodied  in  a  few  large 
LSI  chips.  The  signal-processing  portion  of  the  GPS  user  equip¬ 
ment  may  be  an  excellent  candidate  for  VLSI,  since  its  function 
is  clearly  defined,  excessive  speed  is  not  required,  and  the 
number  of  input-output  pins  is  relatively  small  because  most  of 
the  interconnections  would  be  internal. 

The  analogue  portion  of  the  equipment  may  be  another 
story.  There  are  596  resistors,  68  coils,  15  analogue  filters, 

56  analogue  microcircuits  (MSI),  579  capacitors,  and  15  single 
transistors  in  the  receiver  for  the  specific  system  cited  above, 
and  it  is  doubtful  if  any  technique  of  integration  so  far 
discovered  could,  at  present,  render  it  much  more  compact  or 
simple.  It  is  likely  that  the  final  cost  of  the  equipment  will 
be  largely  controlled  by  the  receiver  section  of  the  GPS  user 
equipment . 

It  is  the  reliability  which  will  be  greatly  improved  by 
application  of  VLSI  (or  even  LSI)  technology  to  the  signal 


processing  portion  of  the  equipment,  since  this  circuitry  is 
two  orders  of  magnitude,  at  least,  more  complicated  than  that 
of  the  receiver. 

The  foregoing  discussion  assumes  that  the  GPS  user  unit 
will  be  a  dedicated  piece  of  navigation  equipment,  made  as 
compact  and  simple  as  possible  to  use.  An  entirely  different 
approach  has  been  favored  by  some  equipment  manufacturers. 
According  to  this  second  point  of  view,  the  user  equipment 
could  take  the  form  of  a  receiver  unit,  similar  to  that  de¬ 
scribed  above,  plus  a  general  purpose,  software-controlled 
minicomputer  which  would  be  programmed  to  perform  the  signal¬ 
processing  calculation.  An  argument  for  this  approach  is  that 
the  computer  can  perform  many  other  "housekeeping"  functions 
which  would  be  of  service  to  some  types  of  users.  Another 
argument  is  that  the  software  might  be  adapted  to  the  needs 
of  particular  types  of  users.  Still  another  argument  is  that 
inexpensive  but  powerful  minicomputers  are  now  standard  articles 
of  technology  and  the  use  of  such  standard  items  can  help  to 
keep  user  equipment  cost  down.  Here  the  role  of  VLSI  or  LSI 
is  to  further  improve  the  general-purpose  computer  portion  of 
the  equipment. 

G.  OPTICAL  ARRAY  SIGNAL  PROCESSING 

A  variety  of  applications  have  been  found  for  arrays  of 
optical  (usually  IR)  sensors,  which  characteristically  involve 
computationally  intensive  processing.  These  applications  in¬ 
clude  satellite  surveillance  for  the  monitoring  of  missile 
launches,  detection  of  incoming  high-speed  targets  for  terminal 
defense,  and  homing  missile  seekers. 

One  processing  function  common  to  all  of  these  arises  from 
unequal  sensitivities  of  the  sensor  elements  in  the  array. 

These  are  sometimes  equalized  by  storing  the  sensitivity  con¬ 
stants  for  each  element  in  a  ROM  which  is  read  out  and  used  to 
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renormalize  the  signals  during  each  frame.  Another  common  prob¬ 
lem  is  discrimination  against  bright  but  extraneous  fixed  sources 
(such  as  smoke  stacks,  blast  furnaces,  solar  glint)  which  neces¬ 
sitates  frame-by-frame  processing  for  discrimination  against 
moving  radiators.  When  the  surveillance  platform  itself  is  in 
motion  this  process  involves  computational  use  of  inertial 
reference  data. 

In  optical  sensor  systems  for  monitoring  missile  launches, 
another  stage  of  computation  attempts  to  discriminate  between 
threatening  and  nonthreatening  trajectories  on  the  basis  of  the 
available  angle  and  angle  rate  data.  The  computations  are  arith¬ 
metically  intensive  and  require  considerable  precision.  The 
general  class  of  algorithms  which  generate  estimates  of  trajec¬ 
tory  parameters  from  angle  and  angle  rate  data  is  developed  in 
Ref.  42.  The  number  of  arithmetic  operations  which  execute 
these  algorithms  is  of  the  order  of  100  per  object  per  frame. 

Optical  surveillance  for  terminal  defense  against  high¬ 
speed  missiles  necessitates  intensive  use  of  signal  integration 
to  provide  the  greatest  detection  sensitivity  relative  to 
search  rate.  Staring  (matrix)  arrays  of  photosensitive  devices 
are  used  and  the  signal  processor  designates  likely  target 
returns  for  examination  by  an  operator  while  allowing  the  tele¬ 
scope  to  scan  at  a  high  enough  rate  to  provide  timely  detection 
of  all  incoming  missiles. 

The  sources  of  detectable  IR  radiation  include:  plasma  at 
the  stagnation  point  on  the  leading  edges  of  missiles  flying  at 
supersonic  speeds,  rocket  plumes,  and  jet  engine  exhausts. 

Target  designation  is  based  on  the  apparent  brightness  and 
stability  of  point  sources  against  background  (clear  sky, 
clouds,  or  solar  reflections,  etc.). 

Two  separate  signal  procesing  functions  are  performed, 
signal  conditioning  and  automatic  target  designation.  Signal 
conditioning  refers  to  adjustments  of  signal  levels  prior  to 
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quantization,  equalizing  signal  levels  from  the  various  ele¬ 
ments  of  the  array,  so  as  to  obtain  maximum  discrimination 
(between  target  and  ambient  ground)  with  any  given  number  of 
quantization  levels  (bit  per  sample).  Altogether  the  signal¬ 
processing  operations  have  the  following  purposes: 

1.  Virtual  image  immobilization 

2.  Clutter  mapping 

3.  Dynamic  range  control 
Change  detection 

5.  Target  designation. 

An  interesting  class  of  missiles  ("fire-and-forget" )  uses 
processed  optical  array  data  for  target  identification.  For 
this  purpose,  a  series  of  operations  must  be  performed  for  the 
general  purpose  of  (a)  image  improvement  (through  noise  sup¬ 
pression,  image  enhancement,  and  feature  intensification); 

(b)  image  registration  (in  those  systems  which  carry  a  stored 
image  of  the  intended  target);  (c)  feature  selection  (by  edge 
detection,  boundary  definition,  shape  extraction,  and  charac¬ 
terization);  and  finally  (d)  object  recognition. 

Algorithms  for  this  purpose  are  still  being  developed,  but 
all  are  computationally  intensive  (several  hundred  million 
operations  per  second).  Because  of  the  very  limited  space  and 
the  relatively  low  allowable  cost,  the  use  of  VLSI  Is  mandatory 

H.  TOTAL  THROUGHPUT  RATES 

The  total  throughput  rates  (In  millions  of  operations  per 
second)  for  various  equipments  and  systems  are  summarized  in 
Table  1.  The  smaller  figures  are  representative  of  the  most 
advanced  currently  operational  equipment,  while  the  larger 
figures  refer  to  systems  still  in  planning  or  under  development 
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TABLE  1.  THROUGHPUT  RATES  IN  MILLIONS  OF 
OPERATIONS  PER  SECOND 


Programmable  A/J  Communications 

Optical  Surveillance  Equipment 

Radar  Processors 

Missile  Sensors  and  Guidance 

Acoustic  Processors 

Airborne  Early  Warning  Systems 


10  to  500 
100  to  2000 
50  to  1000 
2  to  50 
100  to  1000 
100  to  3000 


The  total  processing  load  for  all  avionics  in  some  types  of 

Q 

next-generation  aircraft  systems  has  been  estimated  at  3  x  lCr 
operations  per  second,  typically  with  16-bit  words.  A  through¬ 
put  rate  of  this  magnitude  in  a  small  aircraft  is  clearly 
infeasible  using  integrated  circuits  of  the  type  commonly  found 
in  today's  military  equipment.  (Such  equipment  would  weigh 
approximately  10,000  kg  and  consume  about  100  kW. ) 

Instead,  new  IC  components  are  needed  which  provide  about 
an  order  of  magnitude  higher  throughput  in  relation  to  power, 
weight,  failure  rate,  etc.,  and  this  can  be  accomplished  by  a 
continuation  of  the  rapid  process  of  product  innovation  and 
improvements  which  has  characterized  the  IC  Industry.  The 
technological  sources  of  this  progress  have  been  varied;  algo¬ 
rithmic  analysis  and  circuit  refinements  and  cleverness  have 
contributed  heavily  and  will  continue  to  do  so;  however,  the 
throughput  rate  per  circuit  depends  most  directly  on  the  minimum 
dimension  of  circuit  components,  which  accounts  for  the  emphasis 
placed  on  lithography  by  military  system  specialists.  The 
characteristics  of  ICs  under  scaling  (reduction  in  the  dimensions 
of  circuit  elements)  involves  a  number  of  fine  points  such  as 
the  average  lengths  of  Internal  interconnections  (see  Section  V 
below),  but  for  present  purposes,  the  approximate  inverse  cubic 
relationship  between  throughput  and  feature  sizes  for  fixed  chip 
area  is  adequate.  The  inverse  cube  law  is  based  on  the  assump¬ 
tion  that  circuit  speed  varies  inversely  (approximately)  under 
scaling  while  circuit  density  varies  as  the  inverse  square. 
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The  potential  effect  of  the  scaling  of  IC  feature  size  on 
throughput  is  indicated  in  Pig.  3>  which  summarizes  the  theo¬ 
retical  analysis  given  in  Section  V,  and  in  design  data  from  a 
number  of  sources.  Since  not  all  bit  manipulations  can  be 
related  to  computer  operations,  the  total  throughput  is  more 
precisely  described  in  terms  of  gate  cycles  per  second  (the 
product  of  the  total  number  of  gates  in  the  system  and  the 
switching  rate).  For  comparative  purposes,  2  x  10^  gate  cycles 
have  been  equated  to  an  average  operation,  reflecting  the  dis¬ 
parity  between  the  number  of  gates  in  the  system  and  the 
fraction  which  are  actually  operative  on  a  given  cycle,  a 
fraction  which  varies  considerably,  depending  on  the  extent  of 
concurrency  or  pipelining  (the  simultaneous  execution  of  more 
than  one  operation).  The  line  labeled  "nominal  throughput" 
follows  the  cube  law  of  scaling  and  is  intended  to  represent 
the  average  throughput  which  might  actually  be  realized  in  a 
large  assembly  of  circuits,  taking  account  of  inefficiencies 
(due  to  high-speed  processors  servicing  slower  I/O  devices, 
for  example). 

The  approximate  relationship  between  total  power  consump¬ 
tion  and  weight  of  IC  assemblies  and  minimum  circuit  element 
dimensions  is  shown  in  Fig.  A  for  various  total  system  through¬ 
puts  based  on  the  nominal  throughput  of  Fig.  3*  A  current 
airborne  early  warning  aircraft  system,  which  contains  large 
quantities  of  ICs  built  to  5  pm  to  7  pm  design  rules,  dissipates 
about  6  kW  and  contributes  600  kg  or  so  to  the  weight  of  the 

host  aircraft  (excluding  prime  power).  Future  versions  of 

o 

similar  systems  are  expected  to  require  nearly  3  x  10  oper¬ 
ations  per  second  (3  Bops),  but  the  available  power  and  payload 
will  actually  be  less.  Figure  4  indicates  that  this  can  be 
accomplished  by  the  use  of  circuits  with  1  ym  to  2  ym  feature 
sizes  throughout  the  system. 

Already,  several  IC  manufacturers  are  producing  circuits 
with  2  ym  dimensions  in  limited  quantities  and  the  feasibility 
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FIGURE  4.  Weight  and  power  consumption  of  IC  assemblies 


of  lh  pm  manufacturing  technology  seems  well  proven.  In  the 
submicron  region,  the  present  (optical)  lithography  techniques 
are  economically  unattractive  and  alternative  methods  are  being 
vigorously  explored. 
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IV.  HARDWARE  MACROS, 

THE  MACRO-SUBSTITUTION  SEQUENCE 

A.  THE  EMERGENCE  OF  HARDWARE  MACROS 

The  history  of  computer  structural  architecture  reflects 
the  continual  struggle  of  designers  to  achieve  the  greatest 
throughput  rate  and  ease  of  programming,  etc.,  within  con¬ 
straints  of  cost,  failure  rate,  power  consumption,  etc.,  imposed 
by  component  technology  (circuit  and  otherwise).  Many  of  the 
practices  initiated  during  the  development  of  the  first  program¬ 
mable  electronic  computers,  such  as  the  accumulator  and  one 
address  instruction,  are  still  sometimes  followed,  even  though 
the  cost  coefficients  and  reliability  factors  have  changed 
profoundly,  largely  due  to  advances  in  microelectronics 
technology. 

However,  this  technology  has  already  inspired  important 
computer  circuit  and  architectural  innovations,  such  as  the 
bit  slices  and  microprogramming;  and  has  produced--in  response 
to  requirements  for  very  high  arithmetic  throughput  rates 
originating  primarily  in  signal  processing  application--the 
one-chip  multiplier.  The  single-chip  multiplier--a  logically 
complex,  dedicated  special-purpose  circuit,  demanding  the 
fastest  and  densest  technology--is  treated  by  the  central 
processor  unit  much  like  a  memory  or  peripheral  unit;  and 
accomplishes  with  one  or  two  addressing  instructions,  in  a  few 
cycles  an  operation  often  requiring  hundreds  of  lines  of  machine 
code  (the  multiply  macro).  Although  the  circuit  itself  is 
comparatively  large,  the  pin-out  requirements  are  not.  These 
attributes  of  the  hardware  macro,  namely  its  compatibility 
with  existing  microprocessors  and  microprogrammed  bit  slices. 
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its  ability  to  execute  complex  operations  rapidly  from  simple 
addressing  sequences,  its  attractive  cost,  and  relatively  modest 
inter-face  requirements  invite  a  search  for  other  useful  forms. 

Several  hardware  macros  are  suggested  by  the  applications 
considered  in  the  preceding  sections;  specfically  (1)  the  SINE- 
function  calculator  (for  use  in  frequency  synthesis,  frequency 
and  phase  acquisition  in  communication,  polar  to  rectangular 
coordinate  transformations  in  SAR,  and  passive  ranging  in 
SONAR);  (2)  the  self-organizing  memory  (automatic  sort  and 
merge)  for  use  in  ELINT  and  JTIDS;  (3)  the  PPT  butterfly  for 
use  in  SONAR,  SAR,  ELINT,  communications,  etc.;  (4)  the  LOG, 

EXP,  etc.,  calculators  and  the  interpolator  for  general  signal 
and  data  processing.  However,  the  utility  of  these  and  other 
hardware  macros  can  only  be  determined  by  a  detailed  applications 
analysis. 

In  JTIDS  and  ELINT  equipment,  among  others,  the  incoming 
data  must  be  sorted  (according  to  one  or  more  parameters  such 
as  angle  of  arrival,  frequency,  amplitude)  which  can  be  a  time- 
consuming  process  in  a  conventional  programmable  processor. 

This  is  another  function  for  which  the  development  of  a  hard¬ 
ware  macro  may  be  justified. 

In  its  simplest  form  this  unit  might  consist  of  two  push¬ 
down  stacks,  one  of  which  normally  contains  all  of  the  sorted 
data.  When  a  new  datum  arrives,  it  is  compared  with  the  upper¬ 
most  number  in  the  stack;  if  the  new  datum  is  larger  it  is 
pushed  into  the  stack,  otherwise  the  top  number  of  the  stack  is 
popped  and  transferred  onto  the  second  stack,  this  process  con¬ 
tinuing  until  a  number  smaller  than  the  new  datum  reaches  the 
top  of  the  stack,  at  which  point  the  new  datum  is  pushed  onto 
the  stack,  followed  in  reverse  order  by  all  of  the  data  which 
had  been  pushed  onto  the  second  stack. 

If  the  number  of  bits  B  in  the  stored  words  are  large  com¬ 
pared  to  the  total  number  of  words  N  (i.e.,  log2N  <<  B),  then 
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the  circuitry  can  be  simplified  and  made  faster  by  storing  the 
data  in  a  RAM  and  sorting  the  pointers  (RAM  addresses). 

In  many  applications  the  datum  consists  of  a  set  of  para¬ 
meter  values  (angle  of  arrival,  center  frequency,  time  of 
arrival,  modulation  parameters,  etc.)  and  sorting  is  only 
relevant  with  respect  to  one  of  them.  This  situation  is  accom¬ 
modated  by  masking  out  the  data  not  involved  in  the  sorting 
operations . 

B.  THE  MACRO-SUBSTITUTION  SEQUENCE 

The  insertion  of  a  hardware  macro  into  a  programmable  proc¬ 
essor  increases  the  total  number  of  gates  but  may  actually 
result  in  a  lower  number  of  gate  cycles  needed  to  execute  a  given 
set  of  instructions,  depending  on  the  execution  frequency  of  the 
macro  and  the  number  of  gate  cycles  taken  to  execute  the  same 
macro  from  software  (by  a  sequence  of  instructions  applied  to 
an  arithmetic  logic  unit,  for  example).  The  following  analysis 
explores  this  question  in  a  general  way,  and  reveals  the  rather 
surprising  fact  that  a  group  of  hardware  macros  may  result  in  a 
higher  throughput  rate  (fewer  gate  cycles  per  instruction)  even 
when  some  individual  macros  do  not.  To  put  it  differently,  one 
good  macro  sometimes  makes  way  for  another.  The  same  analysis 
can  be  applied  to  the  use  of  software  or  firmware  macros. 

Let  G  be  the  number  of  gates  in  a  processor  (which  may 
contain  a  sequencer,  arithmetic  logic  unit,  microcode  ROM,  etc.), 
let  f^  be  the  relative  frequency  of  the  i'th  instruction,  and 
Ci  the  number  of  cycles  taken  to  execute  it.  The  average  number 
(N)  of  gate  cycles  per  instruction  will  be 

N  =  G  V  f.  C.  =  G  C  . 
o  t—i  i  i  o 

i 
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Now  if  the  substitution  of  the  i'th  hardware  macro  involves 


the  net  addition  of  gi  gates,  but  reduces  the  number 
to  cj,  then  the  average  gate  cycles  become 


Ni  =  (Go  +  si)  {C  “  (Ci  “  Ci)  fi} 


fitci\ 


in  which 


AC1  =  ci  -  ci  • 


The  substitution  will  affect  a  net  decrease  (N^  <  N) 


Si  <  Gn  “ 


ffACi 


C-f^C^ 


After  the  substitution,  the  number  of  gates  is 


Gi  ■  °o  +  S1 


and  the  average  number  of  cycles  per  instruction 


C j  =  C  "  f. AC j  , 

1  o  11 


in  terms  of  which 


gi  <  f^C, 

°i 


of  cycles 


provided 
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Suppose  the  macros  are  ordered  such  that  the  first  (i=l) 
gives  the  smallest  gate-cycle  product.  The  insertion  of  a 
second  macro  (the  j  *th,  say)  will  involve  the  net  addition  of 
glj  Sates  and  will  result  in  deletion  of  AC^  +  ACj  cycles  and 
will  be  productive,  provided 


G, f,AC, 

g,  4  <  1  • 

xj  7 r 

Clj 


By  comparison,  the  condition  for  a  productive  substitution  for 
the  j'th  macro,  if  the  first  hardware  macro  were  not  used,  would 
be 


Wj 


But  G..  >  G  and  C,  .  <  C . ,  hence 
1  o  lj  J 


glJ  "  gJ 

and  the  insertion  of  the  j ’th  macro  might  be  advantageous  after 
the  insertion  of  the  first  one  but  not  before.  This  possibility 
is  illustrated  by  the  following  hypothetical  example  in  which 
two  macroinstructions  out  of  a  large  group  are  candidates  for 
hardware  macros. 


i 

1 

2 

f 

0.25 

0.10 

C 

150 

200 

C' 

3 

3 

9 

6,000 

10,000 
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Gq  =  10,000  (the  number  of  gates  in  the 
original  processor), 

f.C.  =  17-5  (the  average  number  of  cycles 
i>2  1  1 

per  instruction,  excluding  the 


first  two). 


Co  =  75  , 


GqCo  =  750  K  (the  original  gate-cycle  product 
per  average  instruction). 

Then,  if  the  first  macro  is  inserted 


Cx  =  75  -  0.25  (150  -  3)  -  38.25 


G  =  10,000  +  6,000  =  16,000 


G1C1  =  612,000  , 


which  has  reduced  the  gate  cycles  per  instruction  from  750  K 
to  612  K,  but  the  further  insertion  of  the  second  macro  results 
in 


C12  =  38.25  -  0.1  (200  -  3)  =  18.55 

G12  =  16,000  +  10,000  =  26,000 

G12^2  =  482  K 

and  the  insertion  of  the  first  two  macros  reduces  the  gate 
cycles  per  instruction  by  over  35  percent  to  482  K. 

If,  however,  the  first  macro  were  not  inserted,  the  use  of 
the  second  macro  only  would  result  in 


C2  =  75  -  0.1  (200  -  3)  =  55.3 
G2  =  10,000  +  10,000  =  20,000 
G2C2  =  1106  K  , 

an  Increase  in  the  gate  cycles  per  instruction. 
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V.  THE  IMPLICATIONS  OF  IMPROVED  LITHOGRAPHIC 
RESOLUTION  FOR  MOS  LSICs 
IN  MILITARY  SYSTEMS 

The  continual  increase  in  speed,  density,  and  reliability 
of  integrated  circuits  which  has  occurred  over  the  past  decade 
has  been  made  possible  largely  by  a  progressive  reduction  in 
feature  sizes — i.e.,  in  the  minimum  dimensions  of  circuit  ele- 
ments--brought  about  by  higher  lithographic  resolution  in  the 
manufacturing  process.  Since  the  extraordinary  benefits  of 
circuit  integration  stem  from  circuit  miniaturization,  the  tech¬ 
nology  of  high-resolution  lithography  is  being  pursued  energet¬ 
ically  in  many  countries.  In  laboratory  systems,  this  tech¬ 
nology  has  already  approached  the  limits  of  optical  (light) 
microscopy  beyond  which  the  circuit  details  will  no  longer  be 
directly  visible.  Although  numerous  practical  difficulties  are 
being  encountered  (in  realignment  at  successive  processing 
stages  and  the  necessity  for  multi-level  metallization,  for 
example)  there  seems  to  be  little  doubt  that  high-resolution 
lithographic  processes  (in  the  one  micron  region  if  not  beyond) 
will  eventually  be  used  for  the  production  of  high-performance 
circuits . 

The  following  approximate  analysis  of  MOS  ICs  examines  the 
dependence  of  throughput  capacity  (the  product  of  clock  speed 
and  number  of  gates),  chip  area,  power  dissipation,  etc.,  on 
feature  size.  A  comparison  of  the  calculated  throughput  capac¬ 
ities  (per  circuit)  with  the  requirements  of  several  important 
military  systems  indicates  that  these  systems  could,  in  princi¬ 
ple,  be  embodied  in  one— or  relatively  f ew--large-scale  inte¬ 
grated  circuits  with  significant  reductions  in  system  support 
costs  (Ref.  3)  using  MOS  circuits  with  feature  sizes  scaled  by 
a  factor  of  2  or  3  from  current  production  practice. 
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However,  the  realization  of  these  benefits  will  necessitate 
considerable  engineering  investments  on  several  fronts.  At 
these  dimensions,  the  greatest  throughput  capacities  are  achieved 
at  much  higher  clock  speeds  (~  100  MHz)  and  lower  logic  voltages* 
1  volt)  than  are  found  in  today's  military  systems.  In  those 
applications  where  individual  signal  or  data  sources  can  utilize 
the  full  capacity  of  the  circuit,  the  higher  clock  speed  would 
involve  reengineering,  but  need  involve  no  architectural  changes. 
In  other  cases,  however,  the  substitution  of  one  (or  a  few) 
higher-speed  circuits  for  larger  assemblies  of  circuits  (having 
equal  throughput  capacity)  would  likely  require  extensive 
redesign  work. 

This  analysis  (which  is  based  on  semi-empirical  formulas, 
specifically  those  for  the  MOS  circuit  characteristics  and  the 
mean  interconnect  lengths  relative  to  the  total  number  of  gates) 
shows  that  the  throughput  capacity  per  circuit,  for  fixed  total 
power,  reaches  a  maximum  with  respect  to  the  number  of  gates. 

If  more  gates  than  this  are  placed  on  the  chip  (which  is  the 
case  with  microprocessors),  the  power  dissipation  in  the  resis¬ 
tive  elements  forces  a  disproportionate  reduction  in  circuit 
speed.  With  fewer  gates,  the  speed  cannot  be  proportionately 
increased  because  of  the  power  dissipation  in  the  I/O  stages  or 
the  cut-off  frequency  (of  the  transistor  with  the  capacitive 
load  of  the  interconnect  line).  The  latter  becomes  a  progres¬ 
sively  more  dominant  limitation  for  smaller  circuit  dimensions. 

Taking  those  parameters  which  give  the  greatest  throughput 
capacity,  assuming  a  fixed  and  equal  power  dissipation,  the 
clock  frequency  scales  in  nearly  inverse  proportion  to  circuit 
dimensions;  the  chip  area  shows  only  a  weak  dependence,  except 


*Some  of  today's  VLSICs  operate  with  logic  swings  of  a  volt  or 
less  but  are  amplified  to  5  V  at  the  board  level  for  compu¬ 
tability  with  standard  TTL,  but  the  voltage-matching  stages 
consume  typically  4 0  mW  each,  which  accounts,  in  part,  for  the 
multi-watt  dissipation  of  these  circuits. 
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below  about  1  ym;  the  number  of  gates  rises  at  a  progressively 
higher  rate  with  decreasing  circuit  dimensions  (increasing  over¬ 
all  by  nearly  three  orders  of  magnitude,  with  a  tenfold  reduction 
in  circuit  dimensions);  and  throughput  capacity,  which  combines 
the  dependence  of  frequency  and  gate  count,  increases  by  over 
three  orders  of  magnitude  over  the  same  range. 

In  summary,  the  calculated  performance  of  MOS  VLSICs  with 
feature  sizes  reduced  by  a  factor  of  three  or  more  from  current 
production  practice  substantially  exceeds  those  of  any  circuits 
currently  in  use. 

The  current  of  an  MOS  transistor  is  usually  given  (Ref.  36) 
by  the  equations 

ID  =  YCg^  [(VQ  -  VT)v  -  |v2] 

in  the  linear  range;  here 

Cg  =  gate  capacitance  per  unit  area 
W  =  gate  width 
L  =  gate  length 
y  =  surface  mobility 
VG  =  gate  voltage 
VT  =  threshold  voltage 
v  =  drain-to-source  voltage, 

while  the  saturation  current  is 

Ts  =  7  YCgry2  for  v  *  VG  -  VT  • 

Prom  these  relationships,  the  characteristics  of  MOS  logic 
elements  can  be  determined;  the  simplest  logic  element  being 
the  inverter. 


A  typical  NMOS  inverter  consists  of  an  enhancement  mode 
NMOS  transistor  with  the  input  signal  applied  to  its  gate  and 
with  another  depletion  mode  NMOS  transistor  used  for  a  load. 

The  load  transistor  has  its  gate  connected  to  its  source  and 
hence  operates  as  a  constant  current  device  over  most  of  the 
range  of  voltage  from  its  source  to  its  drain.  The  circuit 
and  its  current  voltage  relationships  are  depicted  in  Pig.  3. 

The  stable  states  are  indicated  by  the  circled  points 
labeled  Q  and  for  low  and  high  output  voltages,  res¬ 
pectively. 

In  the  addendum  to  this  discussion,  the  speed  of  such  an 
inverter,  as  related  to  its  design  parameters,  is  first 
estimated.  The  power  drain  is  then  estimated,  and  these 
results  are  utilized  to  find  interrelationships  between  number 
of  such  gates,  chip  area,  and  other  scaling  factors. 

The  total  average  power  dissipated  in  the  circuit  consists 

of  two  parts;  the  one  produced  by  current  flow  in  the  quiescent 

stage,  the  other  produced  by  transient  current  flow  during  the 

charging  and  discharging  of  the  interconnect  structure  and 

transistor  gates.  Appendix  A  contains  a  detailed  analysis  of 

the  operating  characteristics  of  an  MOS  inverter  in  which  the 

required  operating  conditions  (unity  logic  gain,  maximum  speed) 

are  given  in  terms  of  the  circuit  parameters.  Formulas  for  the 

transient  (P. )  and  quiescent  (P  )  power  dissipation  and  cut-off 
u  q 

frequency  are  also  developed,  namely: 


p  =  i  5.  (v  -  v  )2 

rt  2  x  v  H  1/ 


and 


Pq  =  F  YCS  L  (VH  "  VT) 

fc  =  TE  Y¥  U  (VH  ‘  V  * 
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In  these  relationships  is  the  power  supply  voltage, 

the  low  (logic)  level  voltage,  VT  the  threshold  voltage,  Cg  the 
gate  capacitance  per  unit  area,  C  the  total  average  capacitive 
load,  and  x  the  actual  clock  cycle.  Specifically, 

C  =  Cowl  +  CgWLP 

where  L  and  W  are  the  gate  length  and  width;  1,  w  the  inter¬ 
connect  length  and  width;  CQ  is  capacitance  per  unit  area  of 
interconnect;  and  F  the  fan  out. 

The  effect  of  interconnect  loading  on  the  cut-off  frequency 
is  noteworthy,  as  will  become  apparent  when  the  characteristics 
for  maximum  throughput  capacity  are  examined.  Clearly,  the 
appropriate  interconnect  capacitance  (appearing  in  the  denomi- 
n itor  of  the  equation  for  cut-off  frequency)  is  the  worst  case; 
i.e.,  the  interconnect  within  the  circuit  having  the  largest 
value  of  CQwl. 

The  resulting  value  of  fQ  appears  generally  to  be  very 
much  lower  than  the  cut-off  frequency  defined  by 

f  =  gfn  =  y^VG  ~  VT^ 

°  2irCg  2ttL2 

(see  Ref.  36,  p.  55  and  Chap.  4),  which  does  not  take  account  of 
the  interconnect  lines. 

The  transistors  which  drive  the  output  tabs,  pins,  and 
external  interconnects  comprise  the  other  major  source  of  power 
which  must  be  dissipated  by  the  IC  package.  The  total  power 
required  for  this  purpose  (at  the  same  clock  period)  is 


where  C  is  the  total  capacitive  load  faced  by  the  output  tran- 

r'' 

sistors  (typically  C  ~  15  pF),  V,  the  voltage  swing  at  the 
board  level  and  Np  the  number  of  output  line  drivers.  When 
^  V,  voltage  matching  stages  must  be  included  which  often 
consume  appreciable  power. 

The  total  power  dissipated  by  both  the  pins  and  gates 
(assuming  the  use  of  the  same  signal  voltage  swings  at  the  chip 
and  board  level) 

pT  - 1  t:  t  Vp  +  Df™g  + 1  yCgTa  l  V 

a. 


in  which  is  the  fraction  of  the  gates  which  are  actually 
switched  in  a  typical  cycle  (typically  about  1/4  in  random 
logic ) . 

With  respect  to  scaling,  it  is  assumed  that  V,  L,  W,  w,  1 
all  scale  by  the  same  factor,  s 

W  =  W  s 
o 

L  =  L  s 

o  _ 

w  =  w  /LW  =  w  /L  W  s 
o  o  o  o 

V  =  V  s 
o 

The  average  interconnect  length  1  has  been  shown  (Refs. 
37,  38,  and  39)  to  obey  a  relationship  of  the  form 


x  is  the  gate  pitch  (mean  distance  between  gates);  writing  this 
in  the  normalized  form  ax  =  s  then  (taking  b  =  0.23) 


1 


i/TTX  s  N 

000  g 


0.23 
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and  in  terms  of  this 


T 


16L  2 
o 

yV 
1  o 


C  w  1 
0  o  o 


N 


0.23 


g 


+  P 


s 


this  being  the  minimum  clock  period,  not  the  actual  (t  ). 

CL 

The  following  parameters  are  representative  of  current 
(1978)  production  design  rules: 


VQ  =  5  volts 

C  =  35  x  10~9  P/cm2 

C  =  3  x  10  *  F/cni 
o 

w  =0.1 
o 

1=9 

0  _ii 

W„  =  6  x  10  cm 

0  _2i 

L  =  6  x  10  cm 

0  P 

y  =  600  cm  /V-sec  (NMOS) 

y  =  190  cr2/V-sec  (PMOS) 

C  =  15  x  10-12  P 
P 

for  which  (taking  Df  =  1/4), 

4  2 

P  =  0.066  Ng  s2  +  [0.03Ng1-227  +  0.16  Ng]  |  +  187.5  Ng  |  , 


where  P  is  in  milliwatts  and  t  in  nanoseconds. 

The  total  number  of  I/O  pins  which  are  needed  are  sometimes 
summarized  by  Rent's  rule: 


K  N, 


0.57 


) 


In  which  K  is  the  average  number  of  lines  entering  and  leaving 
a  gate,  six  cr  more  (Ref.  4 0 ) ;  less  than  half  of  the  Nj  would 
be  the  output  line  drivers.  This  simple  formula  overestimates 
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the  requirement  for  pins  for  large  numbers  of  gates  (greater 
than  1000  or  so).  We  use  the  assumption  (see  Appendix  B)  that 
above  this  level  no  more  than  50  output  line  drivers  are  needed. 


The  tolerable  power  level  (Ref.  *11)  is  given  by 


in  which  T,  is  the  junction  temperature,  T  the  ambient  tempera- 
J  ° 

ture,  and  p  the  thermal  resistance  of  the  package.  For  a  40-pin 

package,  typically  p  =  30°C  per  watt.  Therefore,  if  the  ambient 

temperature  sometimes  reaches  125°  and  the  junction  temperature 

175°C, 


P  ~  1.6  watt. 

At  higher  power,  special  packaging  (p  >  30°C  per  watt)  must  be 
used.  Actually,  most  circuits  dissipate  less  than  1  watt. 

The  active  chip  area  consists  of  the  area  covered  by  the 
interconnect  structure  (A1),  the  area  covered  by  the  gates 
themselves  (A  ),  and  the  area  covered  by  the  I/O  pins  and  line 

o 

drivers  (Ap). 

The  chip  area  covered  by  the  interconnect  structure 


A.  a  N  wl 
i  g 


while  the  chip  area  occupied  by  gates 


A  =  A  s  N  , 
g  o  g 


where  Aq  =  2  mils^  (typically)  and  the  area  per  I/O  pin 


A  *  7.5  x  103  ~  mils 


t 


The  total  chip  area 

At  .  2wlNg  +  A0s\  +  ApNp 

A  =  [3-125  X  10-5  s2N  +  4  x  10“6  s2N  1,226  +  0.04]  cm2 

o  o 

Using  these  relationships,  a  family  of  curves  has  been 
generated  showing  the  number  of  gates  per  circuit  relative  to 
the  clock  period  for  a  fixed  power  dissipation  (300  mW),  for 
various  chip  areas  and  scaling  factors  (Fig.  5). 


Ng  =  NUMBER  OF  GATES  PER  CHIP 
f  =  CLOCK  FREQUENCY 
A  =  CIRCUIT  AREA 

MOTE  Unity  scale  factor  corresponds 
approximately  to  current 
-1q13  production  practice. 


LlO« 


Mo" 


2 


MO10 


SCALE  FACTOR 


FIGURE  5.  Throughput  capacity  In  relation 
to  minimum  feature  size 
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APPENDIX  A 

NMOS  INVERTER  OPERATION 

1.  With  reference  to  Fig.  A-l;  when  input  voltage  is  low, 
current  in  the  switching  transistor  is  zero  and  the  output 
voltage  is  high,  at  power  supply  potential. 

2.  When  input  voltage  is  high  (at  power  supply  potential)  the 
output  falls  to  a  low  voltage  determined  by  the  division 
of  voltage  between  the  switch  transistor  and  the  load 
transistor. 

3.  When  input  makes  an  instantaneous  low-to-high  transition, 
the  current  of  transistor  1  rises  along  the  upper  dotted 
line  of  Fig.  A-l  in  the  direction  of  the  arrows,  then  fol¬ 
lows  along  the  line  U  =  V  ,  approaching  the  stable  point  1 
asymptotically.  The  current  in  transistor  2  moves  to  the 
left  along  the  curve  Ig  until  it  also  reaches  the  point  1. 
The  charging  current  for  the  output  circuit  capacitance  is 
the  roughly  triangular  area  between  these  two  curves. 

A.  When  the  input  makes  a  high-to-low  instantaneous  transition, 
the  current  in  the  switching  transistor  falls  immediately 
to  zero  and  remains  there.  The  current  in  the  load  tran¬ 
sistor  moves  to  the  right  along  the  line  until  it 
reaches  the  point  2  asymptotically. 

For  estimating  purposes,  the  small  portion  of  the 
curve  Ij  between  the  plateau  and  the  point  2  will  be 
assumed  to  remain  at  the  plateau  level  so  that  will  be 

a  horizontal  line. 
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CURRENT  I,  UNITS  OF  KXj 


FIGURE  A-l.  NMOS  inverter,  depletion  mode  load 
enhancement  mode  switch 


'W* 


f 


Also  for  simplicity  the  parabolic  curve  of  the 
current  in  the  switching  transistor  will  be  replaced 
by  a  straight  line. 


:i  =  KXi  ; 


(U  -  vT)‘ 


initially  when  V  =  5 


Ix  =  KXX  [(U  -  VT)  V  -  |2 


when  V  = 


2 


where  Vc  is  the  cut-off  voltage  for  the  depletion  mode 
load  transistor. 

The  approximated  initial  charging  current  is  for  a 
low-to-high  input  transition: 


I  I  -  c  —  -  KX  (Vh  ~  Vt)< 

il  -  i2  -  -  C  dt  “  KAi  2 


v2  2 


when  V  =  . 


When  the  output  falls  to  VL,  the  charging  current  falls 
to  zero;  in  between  and  the  current  in  the  straight 
line  approximation  is 


■  dV  V  -  VL 
-  c  dt  -  VH  -  VL 


|KX, 

,-r 


(vH  -  vT)' 


X  ~ 

4  vc 


Then  V  -  VL  =  (VH  -  VL)  exp 


KX1  (VH“VT) 


2  _  ^ v  2 

Xi  vc 
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The  time  constant  is,  for  the  output  V„  VT 

n  Li 

transition , 


T1  = 


2(Vh  -  VL)  C 


KX, 


<VV  -  xf  V 


For  the  output  -*•  Vpj  transition,  the  behavior  is  simpler. 
I 2  is  zero  and  is  constant  so  that 


AV  "  VH  -  VL 


so  that  the  time  required  for  this  transition  is 


t2  = 


2C  <w 
^2  2 
KX1  4  V 


The  first  transition  is  95  percent  complete  when 
t  =  3t, .  It  is  desirable  to  have  equal  rise  and  fall  times 
so  that  —  can  be  chosen  to  make  =  3t^  . 


2°  (VH  -  VL) 
X 

KX1  4  V 


=  3 


2C  (VH  -  VL) 


KX, 


(V  -V  -  —  V  ^ 
' VH  VT'  X1  VC 


X2  2 
■3  _£  v  c 
J  x1  VC 


(V  -  V  -  —  V  ^ 
' VH  T;  X1  C 
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5  and 


(vh  -  y 

w„2 


for  equal  rise  and  fall  times.  Thus,  if  = 
VT  =  Vc  =  1,  then 


& 


f 

% 


4  . 


With  this  value,  the  current  in  the  on  condition  is 

Kxi  J7  V  -  ““i  • 


In  the  off  condition,  the  current  is  zero. 

When  operating  at  the  highest  speed,  the  duty  cycle 
is  1/2  so  that  the  average  current  is  half  that  in  the  on 
condition . 

The  transition  time  when  rise  time  equals  fall  time 
is  T2  with  the  proper  value  of  X^/X^  inserted 


T,  = 


8C  VH  “  VL 


;2  "  kxi  (vH-vT)2 


Since  VT  is  approximately  , 


_  8C  1 

2  "  KXi  fW 


for  equal  rise  and  fall  times 

C  L, 


t2  =  8  yCg  v:x 


• 
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Minimum  cycle  times  must  not  be  less  than  twice  this. 
Denoting  the  minimum  cycle  time  by  , 


T 


c 


16C  L1  1 

Ycg  w,  •  > 


but  C 
Then 


C  wl  +  C  W.L.P 
o  g  1  1 


T  > 


16  (C0^lLj  +  CgLl2F) 


The  power  necessary  to  charge  and  discharge  the 

2 

capacitance  load  is  at  maximum  speed.  At  less  than 

C 

maximum  speed,  at  frequency  f  this  is 

1 2 


pi  - 


fC(VH-VL)' 


Since  the  highest  allowable  frequency  is  proportional  to 
1/C ,  the  capacitance  cancels  out  of  the  expression  for  power 
at  the  highest  allowable  clock  frequency  1/Tq. 


P 


1 


c<vv2 

2 


>WV  W1 

16C  L1 


yc  w 

=  _ a  _ ±.  (v  -  V 

32  L1  v  H  L 


3 


at  top  speed,  proportional  to  frequency  for  lower  clock 
frequencies . 
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The  average  power  at  low  speeds  with  a  duty  cycle  of 
1/2  is 


p  -v  ffifiV 

o  “  p  2  X1  2 


X?  (VVT} 

Utilizing  the  value  of  ^  =  - 5 —  for  equal  rise  and  fall 

Jit  t  ^ 


times,  the  average  power  at  low  speeds  is 


v  kx  (vH-vTr  YC  W  2 

P  =  _E _ _ i —  =  -  g  (v  -v 

o  lS  16  L.  ' VH  T;  * 


Then  the  total  power  at  maximum  speed  is 


Po  -  P1  -  inf  •  [<VVT>2  +  7  <VVL>3] 


Since  VT  and  VL  are  nearly  equal,  it  is  of  interest  to 
notice  that  the  DC  power  is  approximately  1/2  of  the  capac¬ 
itance  charging  power  at  maximum  speed  if  VH_VL  *  4  volts. 

The  two  are  equal  if  VH~VL  =  2  volts. 

Summary :  "DC"  power  at  50  percent  duty  cycle  (independent  of 

speed)  is 

po  It  Y°g  (VV2  ' 

Capacitance  charging  power  at  maximum  speed  is 


P1  =  32  yCg  (VH"VL)3  ‘ 
_  -  _  1  W1  VH‘VL 


Maximum  cycle  frequence  fc  “  Ycg  £ - 
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PIN-TO-GATE  RATIO 

Landman  and  Russo  (Ref.  40)  partitioned  four  logic  struc¬ 
tures  into  circuits  of  various  sizes  from  1  gate  per  circuit  to 
full  integration  (the  complete  logic  design  on  one  chip).  The 
logic  structures  contained  671,  3,000,  9,900,  and  12,700  NOR 
gates,  respectively.  Their  data  has  been  replotted,  in  a 
different  form,  in  Fig.  B-l  together  with  a  series  of  bit  slice 
ALUs,  a  TTL  gate  array,  and  an  8  x  8  multiplier. 

To  be  sure,  much  lower  pin-to-gate  ratios  pertain  to  mode- 
switched  and  pipelined  circuits.  However,  mode-switched 
circuits  have  a  commensurately  lower  relative  throughput  since 
only  part  of  the  circuitry  (corresponding  to  the  selected  mode) 
is  active  during  any  cycle.  From  this  it  is  reasonable  to  infer 
that  for  non-pipelined  general  logic  the  number  of  pins  to  be 
provided  in  relation  to  the  number  of  gates  must  be  near  or 
above  the  line  labeled  M--except  possibly  at  much  higher  levels 
of  integration.  This  clearly  suggests  the  necessity  for 
developing  new  100-  to  200-pin  packages  for  high  throughput 
VLSI  circuits. 
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FIGURE  B-1.  Pin-to-gate  ratio 


